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REPLIES OF PSYCHOLOGISTS TO A SHORT QUES- 
TIONNAIRE ON MENTAL TEST DEVELOP- 
MENTS, PERSONALITY INVENTORIES, 

_ AND THE RORSCHACH TEST 


ARTHUR KORNHAUSER 
Bureau of Applied Social Research, Columbia University 


In Aprit 1944 a brief questionnaire was sent to 85 selected 
specialists on mental tests whose views might be considered 
representative of highly competent thought in that field. 
Seventy-nine completed blanks were returned (93 per cent). 

The first part of the question-blank asked a few questions 
about the practical value of intelligence tests. This material 
will be summarized in a later report in this journal as well as 
in more popular form elsewhere.* 

The second half of the questionnaire, to be reported here, 
contained somewhat more technical questions. These were 
intended for report within the profession only, not for the 
general public. 

Both the formulation of the questions and the selection of 
the expert panel were based upon personal conferences with 
six specialists who served as advisors. The final list of experts 
represents the pooled judgment of these advisors.? 

1 The set of questions on intelligence testing constituted one of a series of “polls 
of experts” which aim to ascertain and report to the public the conclusions of a 
cross-section of leading authorities on questions in their special fields. A continuing 
project of this kind, it is believed, may help reduce the lag of public thinking 
behind the views of the well-informed. The polls are intended for prompt pub- 
lication in a mass-circulation magazine. While publication arrangements have been 
delayed during the initial stages, plans are now completed for having the poll 
reports appear monthly in The American Magazine. 

he cooperation of the mental test authorities who participated is gratefully 


acknowledged. In addition to those in the following list, three others requested 
that their names be not listed. 


Dr. Dorothy C. Adkins Dr. Albert K. Kurtz 
Dr. Anne Anastasi Dr. E. F. Lindquist 
Dr. Rose G. Anderson Dr. Irving Lorge 


Dr. Grace Arthur Comdr. C. M. Louttit 
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A summary of responses to the second set of questions 


follows. 


Question 1 


In the further development of mental ability testing for prac- 
tical use in schools and in business, do you think most will be 
accomplished if psychologists concentrate on measuring sepa- 
rate intellectual factors or if they continue to emphasize the 
measurement of “general” intelligence? 





ee a 
General intelligence ............... 
Both “separate” and “general” checked 
Other answer or no answer 


No. 4 replies. 


Cee eee ee eee eee eeeeerereee 


In addition to the seven who checked both, there are 9 
others whose comments suggest that both types of emphasis 


are desirable. 


Even with these included, an overwhelming 
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majority of the 79 experts answer in favor of “separate 
factors.” 

Some typical comments from those who say both “separate 
factors” and “general intelligence”: 


Both are important. Developmental work in “separate fac- 

tors” should probably be given higher priority. 

“Separate factors” for a period until we find what it’s all 

about. I don’t believe tests of these “separate factors” will 

ever supplant entirely the test of general ability. 

I think that there is still a definite use for the “general intelli- 

gence” test, but the measurement of “separate factors” should 

also be made. 

Several others simply say, “Both needed.” One interest- 
ing belief in both is expressed in these words: “ ‘General in- 
telligence’ tests for children and group factor tests for adults.” 
Another suggests that it is a question of the time available for 
testing: If only one hour, a general intelligence test; if three 
or four hours, tests for separate factors. 

Five of the psychologists explicitly reject “general intelli- 
gence” measurement altogether. They say, for example: 


“General intelligence” is a hodge-podge of several relatively 

independent group factors. 

The concept of “general intelligence” should be entirely dis- 

carded. 

“General intelligence” is like what Henry Ford said about 

history. 

In advocating major attention to separate functions, two 
replies particularly stress testing for the three abilities—verbal, 
numerical and spatial or mechanical; seven point to the special 
usefulness of separate ability tests for industrial or other spe- 
cific purposes; three dissociate their belief in measuring sepa- 
rate abilities from any particular “factor analysis” methods 
or any particular classification of abilities. 

On behalf of continued emphasis on the measurement of 
“general intelligence” the following comments are made: 


The value of “general intelligence” tests has been demon- 
strated as is indicated in their widespread use particularly in 
schools; the validity and value of measures of “separate fac- 
tors” have yet to be shown. 
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With criteria as undependable as they now are, tests of gen- 

eral ability do as much as can be expected and tests of separate 

factors represent excessive refinements. 

In respect to measuring isolated functions, attention is 
called, in four or five answers, to the limitations of this pro- 
cedure. For example: 


All so-called mental abilities seem to be intercorrelated. 
Therefore you cannot test an isolated factor if you try. 


The idea of test purity is far from clear. At best factorial 
composition always involves a “potpourri” factor. 


“Separate factors” in the strict sense cannot be now measured, 

but approximations can be made. 

A point of some interest with respect to the tabulated 
replies to this question is their relationship to the age of the 
respondents. 

The median age of those who answered “general intelli- 
gence” either alone or together with “separate factors” is 56. 
The median age of all others is 42. Looking at the matter 
in the other direction, of the 28 persons 50 years of age and 
over, 29 per cent answered “general intelligence”; of the 44 
persons under 50, only 9 per cent answered in this way. It 
almost looks as though “general intelligence” is becoming an 
old man’s concept! 


Question 2 


In the field of personality testing, how satisfactory or helpful 

for present practical use do you consider: 

(a) Personality inventories and questionnaires (such as those 
of Bernreuter, Bell, Humm-Wadsworth, etc.)? 


(b) The Rorschach test? 


Question (a) Question (b) 
No. 4 No. % 
Fiishily. Gatwstactory <.0ic sce scccesess 1 1.5 0 0.0 
Moderately satisfactory ............. 9 13.5 12 20.0 
Doubtfully satisfactory .............. 24 36.0 17 29.0 
Rather unsatisfactory ............... 22 33.0 13 22.0 
Highly unsatisfactory ............... 11 16.0 17 29.0 


67 100.0 59 100.0 
Qualified; unclassifiable* ............ 8 5 
No answer or don’t know ............ 4 15 


* Of the 8 qualified responses on question (2a), 5 tend to be favorable, 1 un- 
favorable, 2 neutral. Of the 5 qualified replies to question (2b), 3 are favorable 
and 2 unfavorable. 
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The ratings of the Rorschach test tend to give a flatter dis- 
tribution than the other. A slightly higher percentage of the 
psychologists consider the Rorschach “moderately satisfactory” 
but a markedly higher percentage also rate it as “highly un- 
satisfactory.” There is almost no correlation between the 
ratings assigned by the respondents to personality inventories 
and to the Rorschach tests (7 =.11). 

When the respondents are classified into clinical and non- 
clinical (according to their statements about their own principal 
types of work), some interesting differences appear in the re- 
sponses to the above questions. 














Personality inventories Clinical Non-clinical 
Rated: Highly or moderately satisfactory ......... 21.0% 11.0% 
Doubtfully satisfactory ...............0.. 39.5 34.0 
Highly or rather unsatisfactory ........... 39.5 55.0 
pS ME et 2 ee 100.0% 100.0% 
(N =28) (N = 38) 
Rorschach Test Clinical Non-clinical 
Rated: Highly or moderately satisfactory ......... 38.0% 11.0% 
Doubtfully satisfactory ................. 29.0 30.0 
Highly or rather unsatisfactory ........... 33.0 59.0 
ARTE OE OMNES nic kc ccdsicaehs eet 100.0% 100.0% 
(N=21) (N =37) 


It is clear that the clinical psychologists are somewhat more 
favorable toward both types of tests. Their opinions differ 
from those of the non-clinicians particularly in respect to the 
Rorschach. 

Similar tabulations have been made comparing the psy- 
chologists who state that a principal part of their work has 
dealt with personality tests and those whose work has not been 
principally in this field. The results are as follows: 


Work with Personality Tests 


Personality Inventories Yes No 
Highly or moderate satisfactory ..............e00- 30% 9% 
EMBUDGUNY SATISIACCOTY: oi. osc ccccscsesseccccseess 33 40 


Highly or rather unsatisfactory ...............0004- 37 51 
(N =30) (N =35) 


Work with Personality Tests 


Rorschach Test Yes No 
Highly or moderately satisfactory ..............+++ 25% 21% 
Doubtfully satisfactory .......s0ccccciesccscscceecs 21 32 
Highly or rather unsatisfactory ..............e0000% 5 


4 47 
(N =24) (N = 34) 
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Comments Regarding Personality Inventories 


The most frequent comments on question 2a are those 
which point to the clinical and qualitative value of the blanks 
as contrasted with their quantitative use for purposes such as 
selection of personnel. Under this heading come such remarks 
as the following: 


« Useful in locating foci for further exploration and counseling, 
i.e., qualitatively rather than quantitatively. 


No use for selection, possibly some clinical value. 


I believe that the personality test used as a clinical tool or 
used under conditions where there is full cooperation of the 
testee and the tester can be very illuminating and practically 
useful. We have not found that they give us the correct in- 
formation when used as a selection tool under ordinary cir- 
cumstances. 


Moderately satisfactory for clinicians; highly unsatisfactory 
for industry. For industry, subject will not “come clean.” 


Most useful in counselling as points of departure, securing in- 
terest on adjustment problems, indicating students who should 
be referred to a psychiatrist, etc. 


A second group of comments has to do with the need for 
validation and further research: 


None of these tests has, in my opinion, been adequately vali- 
dated against satisfactory outside criteria. 


Highly unsatisfactory due primarily to inadequate standard- 
ization rather than to intrinsic lack of validity. 


It is impossible to measure validly the gradations of person- 
ality adjustments by pencil-and-paper tests. Coarse distinc- 
tions that are reflected are usually obvious without testing. 
It would be helpful if psychologists as a group would go on 
record to this effect. 


These tests validated for specific jobs have been helpful. In 
their presentform, using standard norms, they are not helpful 
in industry. 


Two and one-half years of work in the service have shown no 
helpfulness in any of these “ready-made” tests which are pre- 
sumed to measure “traits” which clinicians presumed to exist. 
On the other hand, both Army and Navy have been able to 
demonstrate the use of “personality tests” constructed to meet 
the specific behavioral requirements of specific fields. 


o 
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Other comments call attention to the value of the person- 
ality blanks insofar as they are competently and cautiously 
interpreted. Examples: 


If used by qualified person. Should never be depended on for 
seme diagnosis without careful check. Of some value for 
research. 


A very great deal depends on the skill and judgment of the 
person making use of the results. 


In the hands of competent clinicians such devices appear quite 
useful—used with other data. For most counselors, who may 
not add salt, better counselling in regard to personality prob- 
lems will be produced without such inventories and ques- 
tionnaires. 


Several replies also call attention to the differences in value 
among the various inventories in use. As one reply puts it: 


There are some 500 personality tests, most of which are of 
little or no value as measurement devices. A few, probably 
not more than a dozen, could be recommended for experi- 
mental use in a testing program. 


In the few comments regarding particular blanks, the Bell 
and the Minnesota multiphasic inventories receive favorable 
mention, the Humm-Wadsworth unfavorable, while the Bern- 
reuter receives both praise and disapproval. 

Scattered comments on other points of interest about per- 
sonality questionnaires are these: 


They are difficult to improve beyond the present level. It 
will take a first-rate genius to make any great improvement. 
A lot of effort has been expended lately, with little progress 
resulting. 


Moderately satisfactory at extremes of distribution. Prin- 
cipal value impresses me in terms of serious deviation from 
norm rather than as absolute score values. 


The psychologically sophisticated person who has some motive 
for making a good impression can consciously distort results; 
disturbed person doesn’t know the true answers with respect 
to himself. 


Such tests should always be supplemented by other data, such 
as observations, anecdotal records, and projective techniques. 
Personality inventories are often excellent aids in “screening” 
individuals for further detailed observation and study. 








ery 
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Comments Regarding the Rorschach Test 


The quantitative replies to Question 2b were tabulated 
above. The ratings are not greatly different from those per- 
taining to personality questionnaires save that there is an in- 
crease in the number of “Highly Unsatisfactory” ratings in 
contrast with the “Rather Unsatisfactory,” and there is a 
decided increase in the number of “Don’t Know” responses. 

The comments with respect to the Rorschach test are 
notably more vigorous. There are numerous references to 
“cultism” and “overselling,” and even more frequent specific 
criticisms concerning the lack of validation. On the other 
hand, a considerable number of the psychologists believe that 
the Rorschach has value if used clinically by adequately 
trained persons. Since the Rorschach technique is so much in 
dispute, it is worth while reproducing a considerable number 
of evaluative comments. They are grouped below into the 
two broad divisions just mentioned, plus a miscellaneous set 
of comments. The parentheses after each quotation contain 
the rating assigned the test and also indicate whether the 
respondent considers himself a clinical psychologist or not. 


A. Rorschach Test of More or Less Value Used Clinically by 


Trained Persons 


Dangerous for amateurs. A valuable instrument in the hands 
of a psychiatrist adequately trained in its use. (No rating; 
clinical. ) 


I feel this is of value as a strictly clinical instrument in the 
same way that free association is, but any attempt to objectify 
scoring of it appears to lead to invalid results. (Doubtfully 
satisfactory; clinical.) 


The Rorschach has already demonstrated its value in clinical 
usage. The increasing research by “courageous heretics” on 
modified Rorschach techniques may be expected to produce 
instruments of considerably more merit than are yes-and-no 
inventories. (Moderately satisfactory; clinical.) 

In the hands of a few well-trained experts the Rorschach test 
may be “moderately satisfactory,” but it requires too much 
highly specialized training under the right supervision to de- 
velop the reliability needed in practical work. (Doubtfully 
satisfactory; clinical.) 

If one sticks to the few basic categories, the Rorschach is valu- 
able. I tend to doubt the validity of the detailed analyses 
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some Rorschach experts are able to make. (Moderately sat- 
isfactory; not clinical.) 


B. Lack of Adequate Validation; “Cult,” etc. 


I think more systematic research and less cultism could pro- 
duce something of value in this particular projective tech- 
nique, as would be true of any projective technique. (No 
rating; clinical. ) 

Do not feel “expert” on this item. This test smacks too much 
of mystery and “cultism.” Also rather too esoteric in rami- 
fications for sound scientific appraisal. (Moderately satisfac- 
tory; clinical.) 

There is need for an empirical validation of this technique. I 
am impressed by the extent to which its validity is assumed 
a priori in terms of some semantic scheme. (Doubtfully sat- 
isfactory; clinical.) 

Highly promising but needs much further research before 
conclusions can be warranted. Much of present work is sci- 
entifically unsound; but some good leads have appeared. 
(No rating; clinical. ) 

Too subjective; clinical signs employed are shifted from study 
to study. Still in the experimental stage and should not be 
used for practical purposes as yet. (Rather unsatisfactory; 
clinical.) 

As a diagnostic instrument its value is entirely unproved and 
the Rorschach workers are going about its validation the 
wrong way: Too much cultism and intuition and too few 
cold facts! (Highly unsatisfactory; not clinical). 

There has been grossly inadequate validation of the claims for 
the Rorschach. (Doubtfully satisfactory; not clinical.) 

So time-consuming and subjective as unlikely to contribute 
much that a skillful interviewer would not obtain more 
promptly by direct means. (Highly unsatisfactory; not 
clinical. ) 

Found utterly useless for predicting success in training of 
aviation pilots. (Highly unsatisfactory; not clinical.) 
Those who use the Rorschach seem always to fall under the 
spell of the special language they have developed and to be 
more interested in assigning names than in making any ex- 
tensive and critical investigation of the validity and reli- 
ability of their basic concepts. (Doubtfully satisfactory; not 
clinical. ) 


C. Other Comments 


When, as in some hands, the Rorschach test proves useful I 
attribute it more to the good sense of the user than to the in- 
strument. (Rather unsatisfactory; clinical.) 
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This test still leaves much to be desired but is certainly a 
step in the right direction. If it could be more easily scored 
and more objective it would be the most effective instrument 
I know of for clinical measurement. (Moderately satis- 
factory; clinical.) 

From the standpoint of the “buyer” it is worth about 15% of 
the time one can spend on an individual examination. For 
the work I do, would practically always want it or a sub- 
stantial equivalent; for dynamic purposes its potentialities are 
less than those of T.A.T. (Moderately satisfactory; clinical.) 
The Rorschach tests is highly promising. Group adminis- 
tration techniques should make it more widely applicable. 
(Moderately satisfactory; not clinical.) 


A good idea poorly carried out. (Highly unsatisfactory; not 
clinical. ) 


Question 3 


What do you consider the most promising mental test de- 
velopments for research students to devote themselves to 
during the years after the war? 


Most of the replies fa into a few broad categories, under 
which responses are classified below. (In considering the 
numbers of replies in different classes it should be noted that 
many respondents listed several ideas; hence the total number 
of suggestions far exceeds the number of persons answering. ) 

Most frequently mentioned are needed developments of 
new tests, especially tests of emotion and personality traits. 
Thirty-six of the 79 psychologists point to work on new tests; 
27 of these indicate tests in the field of personality. This 
result may have been influenced in some degree by the fact 
that the immediately preceding questions pertained to person- 
ality blanks and Rorschach tests. 


Illustrative Suggestions Regarding Development of New Tests 


Independent measures of ability; specific aptitude tests. 

Tests of mental development that evaluate objectively the 
higher mental processes (as in the Eight-Year Study of 
the Thirty Schools Experiment of the P.E.A.). 

Better non-language performance tests for children and adults. 

Better materials, individual and group, for exploring aptitudes 
of gifted children. 

Mental tests for adults, especially older persons. 

Short tests for industrial use. 
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| Mentioned next most often (by 18 respondents) is the need 
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Culture-free tests of general ability. 

Tests for special groups—bilingual, blind, deaf, etc. 

Wisdom, thinking, judgment, etc., as distinct from mental 
alertness. 


Development of tests for such traits as perseverance, ability 
to supervise (leadership), emotional and social maturity. 

Objective personality tests; indirect scoring so that subject 
doesn’t know its real purpose. 

Projective techniques as personality tests; objectification of 
projective tests. 

Personality tests using operationally defined concepts tied to 
particular fields and giving up the magic of such alleged 
traits as “extroversion,” “dominance,” and the like. 

Measurement of personality factors and of the as-yet-un- 
measured but important intellectual factors. 

One important area in business is to measure “drive” in pros- 
pective executives. 

Tests for specific personality traits for which good criteria are 
available. 


Measurement of “basic human drives.” 


Useful and readily scorable interest inventories. 
Interest measures through a wider range of occupational, edu- 
cational, and avocational activities. 


Achievement tests through a wider range of life and job 
situations. 

Tests of educational development that yield sub-scores indi- 
cating specific aspects of intellectual development. 

Trade and proficiency tests geared to specific occupations. 

Tests to measure achievement that will actually be functional 
in normal life, e.g., homemaking, consumer science, health, 
marriage, child development, social attitudes, labor rela- 
tions, propaganda analysis, etc. 


for work on criteria and the carrying on of validation studies 
to ascertain the relation of particular tests to criteria. 


Illustrative Suggestions Regarding Validation Studies 
and Criteria 


More concentration on validation on large representative 
samplings against adequate criteria, especially with adults 
against vocational success. 


The predictive values of specific tests for specific performances 
in practical tasks. From these specifics it will be possible to 
develop data regarding the “types” of tests that predict suc- 
cess in “families” of occupations or activities. 
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Factor analysis of abilities and of criteria and establishment 
of satisfactory inter-relationships among the tests and criterion 
factors. 


Decrease emphasis upon new test construction; increase 
emphasis upon the development of adequate criteria and 
validation and cross-validation of tests of special aptitude and 
other predictors against such criteria with samplings of ac- 
ceptable size. 


Development and evaluation of realistic criteria for: (a) occu- 
pational successes in given job titles, (b) occupational success 
as general occupational adjustment, and (c) general social 
and personal adjustment. Of the two needs (work on tests 
and on criteria) it is considered that the criterion side should 
be given most emphasis. 


By all means investigators should work as hard on good 
methods of evaluating proficiency on the job as they do on 
tests. As an end in itself, this has most salutary effects, but 
it is essential if one is to validate selective tests adequately. 





The one remaining category into which many suggestions 


fall (17 responses) pertains to studies aimed at analyzing and 
interrelating the component factors of ability—either by fac- 


torial methods or otherwise. 


a variety of other individual answers. 
ber of these may be suggested by a mere listing of topics: 





Illustrative Suggestions Regarding Factorial Studies 


Application of factor analysis results to the construction of 
differential aptitude batteries, followed by standardization of 
such batteries on a wide range of schools and occupations. 


Identification and measurement of separate factors. Con- 
struction of tests which will measure such factors as inde- 
pendently as possible. 


Psychological analysis of separate or group factors to supple- 
ment or replace the mathematical analysis now so much em- 
phasized. In recent years we have had an orgy of statistical 
analyses. 


Isolation of meaningful complexes or factors. 
Refined experimental work on the isolation of mental abilities. 


Fundamental research on the best reference variables of intel- 
lectual and temperamental aspects of personality is badly 
needed. 


In addition to the above types of reply, the question elicited 


The content of a num- 
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| Research on the interview. 

| . Attention to traits which are not measurable. 

Analysis of job requirements. 

Attitude inventories. 

q Relationships between successive annual increases 

j of intelligence. 

é Changes of intelligence with age. 

Construction of prediction tables. 

Reliability studies. 

More adequate adult norms; greater attention to 
representativeness of samples. 


A few further points of some interest which have not been 
covered in the above categories are presented in the following 
quotations: 





MOAT TET MG 


The development of a general battery that will measure all of 
the occupationally significant factors and which can be secured 
for groups of occupations covering the entire occupational 
range. Such a testing technique is needed for occupational 
_ 4 counselling. The findings with respect to groups of occupa- 
and | tions requiring similar abilities would also have an important 
‘ac- | bearing on curricular development. Accomplishment of this 
i task would demand cooperation among various research 
groups. 
Developments designed to encompass all major aptitudes as 
opposed to particularity of appraisal, i.e., composite descrip- 
tion of the complete person. We need test batteries stand- 
ardized on the same sample with interrelations and differential 
validities. 
The most urgent need is to try many tests of various kinds 
on the same people. . . . The establishment of unique human 
profiles is our most urgent need. 
Relation of responses as elicited by inventories and question- 
naires to variations in behavior under different environment 
and changes accompanying education and training. 
The use of cumulative records of comparable test data 
throughout the school and early employment life of the indi- 
vidual, with equal or greater interest on cumulative anecdotal 
records of actual behavior. 
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CIVILIAN TESTING IN THE QUARTERMASTER 
CORPS 


W. C. KVARACEUS ano W. N. DUROST 
Civilian Testing Section, ASF 
AND 


R. F. McCLELLAN 
Office of Quartermaster General 


THE QUARTERMASTER Corps is one of the several supply 
services in the United States Army. The duties of the 
Quartermaster Corps comprise the initiation, procurement, 
supply, and maintenance of all articles and equipment needed 
by every soldier or necessary to the administration of the 
United States Army with the exception of the weapons with 
which the soldier fights and certain classes of transport. In 
order to supply, feed and clothe some nine million men quar- 
tered in both hemispheres, the Army maintains 22 Quarter- 
master and Army Service Forces Depots in the United States 
and draws heavily on the civilian worker to assist in this vital 
phase of the war effort. Roughly, 80,000 civilians are em- 
ployed in some 300 different jobs in these depots throughout 
the country. These civilian jobs range from the highly skilled 
technical and professional positions to those of unskilled 
laborers. 

With the advent of the war Quartermaster Corps, like other 
technical services, expanded to fantastic proportions in the way 
of a world-wide supply organization. This tremendous growth 
demanded the hiring of thousands of workers and presented 
many problems of assignment, training, and employee relations. 

Lieutenant General E. B. Gregory, the Quartermaster Gen- 
eral, was quick to recognize the basic management principle that 
results are achieved through people. Since civilian employees 
comprise a large percentage of the total personnel in the in- 
stallations under the jurisdiction of the Quartermaster General, 
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the accomplishment of the mission of the Quartermaster Corps 
depended, to a very large extent, upon the effective manage- 
ment and utilization of civilian personnel. As one means to- 
ward this goal, the Quartermaster General, through Colonel 
Eugene G. Mathews, Chief of the Civilian Personnel Branch 
of the Office of the Quartermaster General, in close coopera- 
tion with the Civilian Personnel Research Sub-Section, Per- 
sonnel Research Section, Classification and Replacement 
Branch, Office of the Adjutant General, has encouraged the 
proper use of psychological tests. It was felt that these tests 
could provide concrete evidence concerning employee knowl- 
edge, aptitudes and skills for use particularly as an aid to 
Placement, Training and Employee Relations activities. As a 
result of much thinking and considerable experimentation in 
a number of depots, a Civilian Testing Section has been set 
up within the Civilian Personnel Branch, Personnel Division 
at OQMG to: 

a. Encourage the use of ability, skills, and aptitude test- 
ing in all Quartermaster and Army Service Forces de- 
pots employing civilians, and to advise in the estab- 
lishment of a Testing Section as a component part of 
the personnel organization. 

b. Coordinate and standardize all testing activities cur- 
rently being conducted and proposed in all Quarter- 
master and Army Service Forces Depots. 

c. Render technical and staff assistance to all Quarter- 
master and Army Service Forces Depots through the 
issuance of a testing manual that will serve as the official 
guide in the use of tests and testing materials and spe- 
cific Quartermaster testing policy and procedure. 

f. Compile for the Quartermaster General such progress 
reports on testing as may be requested. 


Cooperation of Office of The Adjutant General 
At the request of Colonel Eugene G. Mathews, Chief of 
the Civilian Personnel Branch, Office of The Quartermaster 
General, two technicians’ were assigned to the Quartermaster 


1 Dr. W. C. Kvaraceus and Dr. W. N. Durost, who were attached at this time 
to the Field Staff, Civilian Personnel Research Sub-Section, AGO. 
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Corps to assist in planning and setting up testing programs 
in selected depots and at the headquarters level. This was to 
provide a background of experience on the basis of which to 
set up service-wide testing, if the experimental program showed 
any worth-while promise in solving some of the personnel prob- 
lems facing the Placement, Training and Employee Relations 
officials. The office of the Adjutant General already had con- 
structed tests of general intelligence, clerical and mechanical 
aptitude, and knowledge and skills and had been given clear- 
ance for use of all testing materials prepared by the United 
States Employment Service. Machinery had also been set up 
within the Civilian Personnel Research Sub-Section of the 
Adjutant General’s Office to construct additional tests when- 
ever the need for such tests was shown. In general, the de- 
velopment of a systematic and comprehensive service-wide test- 
ing program throughout the Quartermaster Corps has been 
done with the active assistance and cooperation of the Office 
of the Adjutant General. 


Trial Testing in Selected Depots 


Some testing already was going on in several depots? before 
technical assistance was procured from the Office of the Ad- 
jutant General. This testing was usually an adjunct of either 
the placement activities, the training program, or the employee 
relations activities. Often, the testing was spotty and hap- 
hazard, and seldom was a qualified full-time or even part-time 
technician in charge. The testing activities in these depots, 
however, revealed an awareness of the fact that some assistance 
could be obtained in the personnel program through the wise 
use of psychological tests. 

Personnel technicians from AGO visited two Quartermaster 
Depots to set up testing activities. In each instance, qualified 
technicians with adequate professional and clerical staff were 
recruited to head the program in the local installation. In 
another depot, the testing activities already had been set up 
under the Placement Branch. In one of the new installations 

2 Credit is due the Philadelphia Quartermaster Depot which, under its own 


2 me tg had activated a comprehensive testing program prior to headquarters 
planning 











20 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Testing was placed under Training. In the other, Testing 
was established as a separate unit coordinate with Placement, 
Training, and Employee Relations. The head of this separate 
testing unit was made immediately responsible to the Civilian 
Personnel Officer. The latter type of organization was finally 
recommended and adopted as most promising if a depot-wide 
program were to function with the maximum effectiveness. 

As soon as the staff was recruited in a depot, attention was 
turned to the application of testing as an aid in the solution 
of operating problems. In one of the initial depots it was 
found that important reassignments and promotions were to 
be made within the Shoe Inspection Section. The local test- 
ing technician, aided by the AGO representative, prepared a 
battery of tests which was given to all shoe inspectors attached 
to the depot. This battery included a learning ability test, 
a shoe inspection information test which was constructed for 
the purpose and published by AGO, a man-to-man rating scale, 
and an activity preference questionnaire, also specially con- 
structed. After the tests were given and the data summarized 
for the operating officials, further assistance was rendered in 
utilizing the test results in specific personnel actions. For ex- 
ample, the five best all-round men were selected from which 
one was later chosen for a special assignment. 

In another depot, file clerks were tested with appropriate 
instruments to discover those clerks whose filing skill was low 
and who were largely responsible for “messy filing conditions.” 
Those file clerks who showed limited alphabetizing skill, but 
who did reveal high learnability, were assigned for training; 
the file clerks who lacked aptitude for this job were re-tested 
with other clerical batteries and were reassigned to jobs for 
which they showed more promise. In still another depot, 
where considerable difficulty had been experienced in the Fiscal 
Branch due to numerous errors in arithmetic processes, all 
fiscal clerks were given arithmetic tests to discover the indi- 
viduals who might be largely responsible for the recurring 
arithmetic errors. Again, according to the test findings, clerks 
either were retrained or reassigned. At the same time, testing 
of all incoming employees was started immediately as an aid 
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in the assignment of personnel to specific jobs at grade as 
determined by the Civil Service Commission. 

The keynote of the field service was the development of a 
testing program which would serve as an aid to solving depot 
problems and, at the same time, demonstrate the potential 
value of test results in guiding personnel action. These trial 
testing programs rapidly expanded and became an integral part 
of the depot organizations. With this experience and the con- 
viction that the wise use of tests could materially aid these and 
other depots, the establishment of a Testing Section at the 
headquarters level, Office of the Quartermaster General, was 
accomplished. The purpose of the newly established Testing 
Section was to coordinate, encourage, and advise in the estab- 
lishment and functioning of testing sections throughout depots 
under the jurisdiction of the Quartermaster General. 


The Contribution of Testing to the Total Personnel Program 


On the basis of the experience in these depots, testing was 
conceived as a service (staff) function standing in relation to 
the Civilian Personnel Officer in much the same way that Depot 
Control stands to the Commanding Officer. Tests could be 
helpful in that they provide objective evidence in the form of 
test scores upon which personnel action could be based. Some 
of the personnel actions to which testing was found to make 
a notable contribution were as follows: 

Placement of incoming personnel. Although the Civil Ser- 
vice Commission reserves the right to certify employees as to 
grade, it does not attempt to specify to what specific duties 
within a given grade an individual is to be assigned. This 
leaves the local depot considerable latitude in placing new per- 
sons on jobs for which they are best suited. The local depot 
itself must test new people if test scores are to be made avail- 
able for most effective placement purposes. A variety of tests 
may be used for this purpose, but basically most Quartermaster 
Depots found that a limited battery of aptitude and achieve- 
ment tests could serve most purposes satisfactorily. 

Reassignment of personnel at grade. Reclassification of 
personnel is a very serious business, regardless of the direction 
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of the change, whether it be promotion or demotion. Testing 
can do much to support such personnel action by demonstrat- 
ing the presence or the absence of the desired skills, knowledge, 
or abilities. 

Selection of personnel for training. Far too much of the 
training being carried on in the operating installations was 
found to be haphazard in the sense that a general order was 
issued to train persons in some specific area such as the use of 
the War Department Shipping Document or military corre- 
spondence, without knowledge of which of the persons selected 
for training already knew the material to be covered by the 
course. In the case of those whose knowledge was incomplete, 
there was no evidence as to the specific areas where gaps in 
knowledge existed, so that the subsequent training was neces- 
sarily on a general rather than a selective basis. By the 
judicicus use of specific information tests in such situations, 
three things were accomplished. (1) Those with a mastery 
of the information sufficient for the needs of their job were 
excused from training. (2) The training of those lacking such 
basic knowledge was justified to their supervisors on the basis 
of objective evidence. (3) The training was directed to meet 
the areas of the greatest need. Following training, retests re- 
vealed the extent to which training had been successful in 
imparting basic information. Note should be made here that 
the failure to get information across to a group may be due to 
a variety of causes. Testing alone will not reveal the reason 
for failure, but only its existence. 

Substantiation of claims of supplementary or higher skill. 
Many manpower utilization programs listed employees’ supple- 
mentary or secondary skills. This information generally came 
from an interview with the employee. Unless the information 
thus obtained was substantiated by the use of objective tests, 
little dependence could be put upon individual claims in select- 
ing persons for higher level jobs. 

6. Separation for cause. When it was necessary to remove 
an employee for cause, such action was greatly strengthened 
by the use of objective tests of an informational or work-skills 
type if the cause was inability to do the work. 
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The Place of Testing Services in the Depot Organization 


Since testing is a staff function, serving all branches within 
the total personnel program, it cannot fully peform its func- 
tions if it is tied to any one of these branches. This has been 
conclusively demonstrated in all the experience of the Quarter- 
master and Army Service Forces Depots. When Testing was 
tied to Placement, its energies were largely absorbed and its 
policies determined for the most part by the needs and prob- 
lems of the Placement Branch to the exclusion of Training, 
Employee Relations, and Operations. When Testing was tied 
to Training, it was found to be similarly handicapped by having 
the focus of attention placed on the needs of the Training 
Branch to the exclusion of the needs of the other branches. 
When Testing was tied to Employee Relations, it was inclined 
to take on a guidance aspect, which was not always in the best 
interests of Placement and Training. Only when the Testing 
Unit was independent and autonomous, its chief reporting di- 
rectly to the Civilian Personnel Officer, could it avoid suffering 
from the restrictions in its activities that were invariably asso- 
ciated, in the minds of operating personnel, with the specialized 
activities of the branch to which it had been tied. 

The desired independence for the testing functions, it was 
found, could be secured in several ways, two of which were 
recommended. First, Testing could be set up as a separate 
branch coordinate with Training, Placement, Employee Rela- 
tions, and Classification. Second, Testing could be set up as 
an adjunct of the office of the Chief of Civilian Personnel, in 
much the same way that Depot Control reports directly to the 
Commanding Officer. The first plan was approved and forms 
the basic pattern in most Quartermaster and Army Service 
Forces Depots. It was recognized that the particular pattern 
of organization arrived at for any given installation should be 
determined in the final analysis by local factors. But always 
it is backed up by the recommendation from the top echelon 
that the testing activity be given the necessary independence 
to permit it to do its work unhampered by subordinating it to 
other personnel functions. 
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Another major reason for this organizational pattern was 
the fact that the quality of personnel needed to operate an 
effective testing program is at least on par with, if not superior 
to, the quality of the personnel required in any other personnel 
function. It is not good management to subsume one activity 
under another when the chief of the subordinate activity is 
classified as high as or higher than the chief of the parent 
branch. 


The Staffing of the Testing Branches 


It was soon discovered that the type and number of per- 
sonnel needed for a testing program depended on the number 
of civilian employees to be served and the variety and com- 
plexity of the jobs filled by the civilian employees, as well as 
on the strength of the personnel program in operation in the 
particular installation. Considerable care has been given to 
the recruitment of trained and experienced personnel to head 
the testing activities in the depots. As the first step in de- 
veloping a promising testing branch, a trained personnel tech- 
nician was hired. Obviously, the successful functioning of a 
testing program will center around the qualifications and ex- 
perience of the personnel hired for these strategic positions. 
The duties and responsibilities of the personnel technician in 
charge of the Testing Section or Branch were such that the 
position always was classified at a professional level, and if 
the depot was very large (5,000 to 8,000 civilian employees) 
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a fairly high professional grade was called for. These jobs 
were set up in accordance with the specifications outlined by 
the Office of the Secretary of War. Usually the personal tech- 
nician in charge of testing had one or more professional assist- 
ants, depending again on the size and type of the depot. One 
or two clerks rounded out the office force. The following job 
description, taken from a typical depot, presents a concrete 
picture of the type of activity involved and the type of per- 
sonnel recruited. 


Personnel Technician (Testing) P-3 


Supervision received: Works under the general administrative and technical 
supervision of the Director of Civilian Personnel, but with considerable latitude in 
planning the details of specific assignments, and with the responsibility of carrying 
out these assignments without detailed direction as to technique. 

Supervision exercised: From time to time will supervise varying numbers of 
clerical and technical personnel engaged in administration, scoring, and item-an- 
alyzing of tests, rating scales, questionnaires, etc., and in the compilation of data 
growing out of the use of such instruments, the computation of necessary statistics, 
and the preparation of reports. Will be responsible for the training of such clerical 
personnel in the necessary techniques, when personnel with previous experience are 
not available. 

Duties and responsibilities: At the X Quartermaster Depot (a Class IV installa- 
tion employing several thousand civilians in separate operating units) is responsible 
for making the preliminary selection of tests, rating scales, questionnaires and other 
devices for use in meeting specific employee personnel problems. Such preliminary 
selection will be subject to the approval of the Director of Civilian Personnel. Is 
responsible for the administration of such tests, rating scales, questionnaires, and 
other devices to the personnel selected, either administering the instruments per- 
sonally or training clerical personnel to do such administration. Is responsible for 
maintaining reasonable working conditions in the space provided for the admin- 
istration of such instruments, with respect to lighting, ventilation, freedom from 
interruption, etc. Is responsible for the scoring of these instruments, for the sum- 
marization of the data so obtained, for its interpretation to the operating officials 
who will use the information (Placement, Training, Employee Relations, Classifica- 
tion, Operations), and for the preparation of reports at periodic intervals and on 
special occasions as required. When a test or other instrument must be selected 
for measuring aptitude or skill in the performance of the duties of some specific 
position, the Personnel Technician is required to familiarize himself with the details 
of the position involved by consulting with the Classification Analyst or by making 
job analyses, or by consulting the Testing Section of the Office of the Quartermaster 
General or other agencies such as the Adjutant General’s Office. If no test is avail- 
able, may be required to construct a suitable instrument. When a test or other 
instrument is required to cover some specific body of information, such as the nature 
of and the regulations covering the use of the War Department Shipping Document 
or Procedure, is required to consult such sources as enumerated above to discover 
the existence of such a test, and if none is available, construct one to fit the need. 
Is expected to construct tests, rating scales and questionnaires from time to time 
in connection with the selection of personnel for training and the measurement of 
achievement after training. In all such test construction work, the procedures used 
must be acceptable from a professional standpoint in line with recent developments 
in this field. In the analysis of test data or data obtained by use of questionnaires, 
rating scales, or other similar means may be required from time to time to compute 
means, standard deviations, correlation coefficients of various kinds, reliability and 
validity coefficients, to prepare bar diagrams, percentile curves, histograms, etc., and 








26 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


to set up local percentile or standard score norms and perform other related duties 
as assigned. 

The importance of developing a working testing program 
which is firmly set on a concrete foundation of adequate per- 
sonnel cannot be over-emphasized. 


Tests and Test Batteries 


All testing for placement purposes and a greater part of 
all other testing in the depots is done with some particular job 
in mind. Either the person tested is being considered for a 
definite opening or the adequacy of his performance in a par- 
ticular position is being appraised. There are too many jobs 
in the Quartermaster Corps to permit the establishment of a 
recommended battery of tests for each job separately. How- 
ever, it is possible to recommend test batteries for a series of 
positions, such as certain series in the clerical or mechanical 
fields. In a few cases specific batteries were set up for job 
classes or for specific jobs. This has been done for those posi- 
tions which appear with the greatest frequency in the various 
depots. 

The basic test® usually given to every employee who is 
hired, or who is referred for testing for any reason, is the Learn- 
ing Ability Test, which exists in two forms. This is a general 
verbal abilities test, omnibus type, using multiple-choice items, 
which closely resembles a general intelligence test such as the 
Otis type. In cases where there is a language handicap or a 
question of illiteracy, a non-reading intelligence test is substi- 
tuted. The next most widely used test is a clerical aptitude 
test, which is given in total or in part to all persons in clerical 
positions. 

Other tests commonly used in the depots include the follow- 
ing: Number Speed, Typing, Shorthand, Military Correspon- 
dence, Digit Reversal, Word Meaning, Coding; Clerical English 
Battery, including tests of Abbreviation, Capitalization, Com- 
pound Words, Grammar, Punctuation, Spelling, Word Division 
in Typing, and Word Selection; Mechanical Aptitude and 
Technical Aptitude Test, including the following: Mechanical 


5 All AGO tests are restricted materials. Commercial tests are used in the 
Quartermaster Civilian Testing program only when AGO tests are not available. 
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ies | Knowledge, Visual Discrimination, Space Relations, Inspection 

Speed, Technical Mathematics, Technical Reading, Figure 
- Cancellation, Elementary Electricity, General Automotive In- 
T- formation, and Radio Information Tests. In addition, specific 


tests of knowledge on the War Department Shipping Docu- 
ments and the Vendor’s Shipping Document have been pre- 
pared. Tests for warehouse jobs, such as packers, checkers, 


of and storekeepers, have also been constructed. Most of these 
b latter tests are used primarily in the training program. 

Ha The Division of Occupational Analysis, War Manpower 
. Commission, has authorized the Adjutant General to reprint 


- or adapt, on a restricted basis for Army use, the tests con- 
structed for the United States Employment Service. At the 
same time, the Oral Trade Questions have also been made 


f available through official channels. In all, several scores of 

ul tests are available for use. 

b Attempts are being made to set up batteries of tests for 

J some specific jobs. Norms are being gathered in terms of the 

performance of new employees and in-service employees. The 
following batteries are given as examples of specific batteries 

S 


prepared for specific jobs. 
Clerk-Stenographer 
Learning Ability Test 
Clerical Aptitude Test 
Typing and Shorthand Test 
Checker 
Learning Ability Test 
q Clerical Aptitude Test 
| Number Speed Test 
| Inspector of Clothing 
H Learning Ability Test 
Inspection Speed Test 
Optical Precision Stereoscope Test 
Rate of Manipulation Test 
Color Perception Test 
Fork Lift Operator 
Learning Ability Test 
Eye, Hand, Foot Coordination Test 
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Physical Fitness Tests 
Vision 
Endurance 
Hearing 
Inspector of Shoes 
Learning Ability Test 
Inspection Speed Test 
Optical Precision Stereoscope Test 
Quartermaster Shoe Inspection Test 
Contract Negotiators 
Learning Ability Test 
Clerical Aptitude Test 
Critical Thinking Test 
Arithmetic Reasoning Test 
Personality Questionnaire 
Baler 
Learning Ability Test 
Revised Army Beta Test 
Rate of Manipulation Test 
Physical Fitness Test 
These are examples of specific test batteries assembled in 
terms of the actual skills involved on the job. Some of these 
batteries are now in the process of validation. 


Test Records and Reports 


It cannot be emphasized too strongly that testing is a‘ser- 
vice function that has no value unless the tests are used. 
Hence, the system of test records and the method of interpre- 
tation is aimed at the purpose of maximum utilization of test 
results. 

Raw scores are never given to operating personnel or to 
anyone outside of the Testing Branch. Various types of norms 
are available, and their use depends upon the purpose to which 
the test scores are to be put. The most generally used type is 
one in which the total group upon which the norms are based 
is divided along the base of the normal curve according to a 
five-point scale. Each of these groups is rather easily char- 
acterized in terms of adjective phrases, indicating degrees of 
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goodness in the quality, skill, or ability measured by each test. 
A test report with these descriptive phases is made out for each 
person tested, and usually turned over to the Placement Tech- 
nician, Employee Counselor, or Training Director. In other 
words, the usual procedure is to give the operating personnel 
an interpretive comment on the test results rather than the 
test scores themselves. More detailed norms are available in 
the Testing Branch for use in cases which call for closer 
interpretation. 

The test results (raw scores) are recorded on a Test Record 
Card which is filed in the Testing Branch. A key-sort type of 
card is the recommended record card used in most Quarter- 
master and Army Service Forces Depots. At the same time, 
test results in terms of five-step grades are entered upon the 
Employee’s Qualification Card which is maintained by the 
Placement Branch. This card is always consulted whenever 
any personnel action is contemplated. Considerable use is also 
made of percentiles as a further interpretive score. 

A daily log of the test results of all individuals examined is 
recorded in duplicate. One copy of the daily results is for- 
warded to Headquarters monthly, where the results are studied 
and service-wide norms are developed. The local installation 
uses its copies of the daily log to establish local norms. 


Validation of Tests and Establishing Critical Scores 


Attention has been given to the validation of test results 
and to the determination of critical scores. Various types of 
criterion data, such as rating scales, quality of work output in 
terms of error scores per unit of work, and quantity of work 
output, have been employed in these studies. Some of the 
investigations have been discouraging in their results, especially 
in the use of rating scales. Efforts are now being made to use 
various types of criterion data other than rating scales, to 
show the relationship between test scores and job performance. 
It is felt by the writers that the difficulty in obtaining satis- 
factory validation data and satisfactory critical scores in many 
cases has been due much more to the inadequacy of the cri- 
terion data than to the selected tests. 
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Quartermaster Testing Handbook 


In view of the establisment of general testing policy of the 
Quartermaster Corps and the expanding testing programs 
throughout the installations, a handbook on testing has been 
prepared. This handbook is divided into two main parts. The 
first part discusses the role of testing in terms of the contribu- 
tion that testing can make to various phases of the depot pro- 
gram, including the responsibility of the Commanding Officer, 
the Civilian Personnel Officer and the Chiefs of Training, Place- 
ment, and Employee Relations. The second par discusses the 
more technical phases and procedures of testing and is intended 
primarily for the personnel who make up the staff of Testing 
Sections. The manual describes the Quartermaster Testing 
Program in considerable detail and reflects the experiences 
gained in establishing Testing Sections or Branches in several 
installations. 

Summary 


The service-wide testing program which has been planned 
and implemented from the headquarters level in the Quarter- 
master Corps shows considerable promise. It may well point 
the way in industrial testing, not only for other technical serv- 
ices in the Army Service Forces but to industry as well. The 
place of testing has been carefully defined as an inherent part 
of an over-all personnel program on a service-wide basis, in- 
volving some 80,000 civilians employed in hundreds of different 
jobs. 

Briefly stated, the functions of the Testing Units as set up 
within the installations under the jurisdiction of the Quarter- 
master General are as follows: 

1. Select appropriate batteries in terms of job require- 
ments. 

2. Administer, score and interpret all tests used in con- 
nection with placement of new employees, determination of 
training needs, evaluation of training, and employee relations 
activities. 

3. Conduct testing surveys at the request of operating 
officials, and provide suitable reports of results through proper 
channels. 
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4. Conduct the necessary research to establish local norms 
and to determine the validity of experimental tests. 

5. Construct new tests as required. 

6. Maintain necessary records. 

7. Coordinate testing with other personnel activities. 

Testing is only one aspect of the Civilian Personnel pro- 
gram in the Quartermaster Corps. But it is an important aspect. 
It provides valuable information concerning the employee’s 
abilities, aptitudes and skills, obtained in a relatively short 
period of time. Tests, wisely used and with due consideration 
to their limitations, enable the immediate location of the more 
apt workers and the more trainable employees. Tests are 
making an important and vital contribution to the war effort 
by insuring the maximum utilization of manpower. It can 
truly be said “Tests have gone to war on the civilian as well 
as the military front.” 
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TESTING BY MEANS OF FILM SLIDES WITH 
SYNCHRONIZED RECORDED SOUND 


HERBERT A. THELEN 
The University of Chicago 


1. Preliminary Considerations 


FUNDAMENTALLY the method of evaluation is to put the 
student into situations likely to result in experiences engen- 
dering overt responses which can be used for valid prediction 
of behaviors assumed to constitute the goals of education. 
The implementation of this method is seen to require a clear 
statement of educational objectives, the setting up of con- 
trolled patterns of stimuli appropriate to the level of maturity 
and other characteristics of the students, the description of 
overt responses of each student, and the generalization of the 
descriptions of these responses into predictions of types of 
response in a wide range of similar situations, and finally, the 
evaluation of the degree of appropriateness of the responses 
of each student as compared with those of his classmates. 

For the purposes of this discussion of a new type of test, 
we shall postulate that by a “test” we mean an instrument 
which measures status of a student relative to certain objec- 
tives and relative to a group of students tested at the same 
time. We shall further assume that the instrument is con- 
structed after the specific objectives have been described 
operationally, and that the test situations have been formu- 
lated to elicit behaviors to be appraised with due regard for 
appropriateness of subject-matter content, level of maturity, 
and type of problem tension engendering the observed re- 
sponses. These assumptions are required in order that dif- 
ferent media of tests may be compared. 

A test is of little value if the results of testing cannot be 
interpreted for some clearly stated purpose. Probably the 
33 
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major factors in the interpretation of a test to which the above 
postulates apply are: (1) adequacy of available description of 
the testing situations, (2) degree of insight of the interpreter 
into the psychology of learning and behavior, and (3) knowl- 
edge of the culture, maturity, tool skills, familiarity with situ- 
ations similar to those in the test, and other relevant attributes 
of the students tested; for these three factors determine the 
validity’ of description of the behaviors presumed to account 
for the responses checked in each test item. Certain aspects 
of these factors may now be considered in detail. 

A. The Testing Situation. At the operational level, a test 
is generally a piece of paper with writing on it. For most 
students it presents the dominant stimuli in a total situation 
which includes the test administrator, a group of students, 
the physical environment, and the temporal place of this sit- 
uation with respect to an undefined sequence of experienced 
situations, both prior and subsequent, each of which has en- 
gendered or anticipated a more or less relevant experience of 
the student. The only factors which may be clearly described 
in the testing situation relate to the test itself, and, with 
lesser comprehension, to the procedure for administering the 
test. If the test actually does elicit the full attention of the 
student, then the remaining physical factors are assumed to 
be of negligible importance in determining the score, and the 
factors of relevant previous experience are assumed to be the 
major cause of differences among the scores of the students. 

It follows that to describe the testing situation adequately, 
one must know: the effectiveness of motivation of each student, 
the exact procedure for giving instructions (and making sure 
that they are understood), the relevant physical and social 
conditions during administration of the test, and, finally, the 
nature of the test items (language used, problem-tensions 
aroused, attitudes implied, slogans employed, and the like). 
Since the extent of the motivation of a student cannot be 
measured directly, and can be inferred only in the case of 
some tests giving a pattern of scores, it is customary to assume 


1 Defined as amount of correspondence between predicted and observed be- 
haviors in the entire population of problems sampled by the test. 
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that all students are either equally or maximally motivated. 
Since the extent of a student’s understanding of the directions 
for taking the test is not measured apart from his achievement 
on following the directions, it is assumed either that the under- 
standing of directions is perfect or else that ability to under- 
stand directions is part of the achievement being appraised; 
the latter is probably the sounder assumption. The usual 
assumption about physical and social conditions is that good 
lighting, adequate ventilation, and an empty seat between 
adjacent students are about the only relevant factors. Since 
a test should not be given if it is inappropriate for the group 
being tested or if it does not measure the desired objectives, 
it follows that use of a test assumes that it will be valid under 
the conditions in which it is used. This assumption can be 
justified only by detailed analysis of the individual items with 
reference to such factors as those suggested above. The neces- 
sity for making these or similar assumptions confronts the 
interpreter of any test. The justification for these assump- 
tions may, however, vary from one test situation to another. 
It would be desirable to make these assumptions as valid as 
possible. 

B. “Verbal” versus “Real” Situations. If a test is used 
merely as a hurdle to be cleared for the sake of a certificate, 
then, within wide limits, the nature of its items makes little 
difference so long as the distribution of scores covers a wide 
range and the students opined to be “best” receive the highest 
scores, and the students believed to be “poorest” receive the 
lowest scores. Such a test used in this way does not require 
discussion here because the preliminary postulates do not apply 
to it. The statement that “specific objectives have been de- 
scribed operationally” means that the major, generally-stated 
objectives have been analyzed and broken down into a large 
number of distinguishable types of action. Each of these types 
has its own properties and relationships with other types; these 
relationships constitute the rationalization of the general objec- 
tive. In a course for which such an instrument is appropriate 
as an aid to evaluation, there has been some effort to teach 
the students the types of behavior, the criteria by which the 
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most appropriate type may be selected in a given situation, 
the rationalization of the general objective, and the scope of 
applicability (range of situations and purposes) of the objec- 
tive. These objectives are generally understood to refer to a 
wide range of everyday experience; but their measurement is 
essayed through symbolic verbal experience. Under these 
conditions it is not difficult to come to the conclusion that all 
mental processes come under the heading of “reading compre- 
hension” and that therefore the major task of schools is to 
teach students to read. 

Situations as conveyed by paper-and-pencil test differ from 
the situations encountered in everyday life in some very im- 
portant respects, and these also present the test interpreter 
with the need for making a number of assumptions whose valid- 
ity is generally not easily appraised. Some of the factors 
involved are: 

(1) Ambiguity. Through verbal symbols standing for 
objects one attempts to engender the same response that the 
objects themselves would produce. Since object-names stand 
for classes rather than for individuals, a number of qualifying 
adjectives (or stated properties) must be given to specify the 
individual. But the adjectives themselves are usually asso- 
ciated with outstanding types of objects to which the adjec- 
tives best apply (within the experience of the reader), so that 
the reader is confronted with the task of visualizing a specific 
pattern of aspects (which in turn can never completely re- 
produce an object) from what amounts to a series of generali- 
zations about varieties of types of objects. It would be quite 
surprising if such a description meant the same thing to two 
different students! . 

(2) Semantic misrepresentation. The particular pattern 


of aspects through which the tester attempts to convey the [| 


object to the reader’s experience may or may not coincide with 
the pattern of aspects by which the reader symbolizes or would 
symbolize the object. 

(3) Incompleteness of pattern of stimuli. All the senses 
together enter into the experiencing of an object or situation. 
Presumably the number of possible kinds of response would 
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depend partly upon the complexity of the pattern of stimuli 
(in this case represented symbolically). There is no adequate 
symbolic notation to even represent odors, tastes, or surfaces; 
and therefore aspects ordinarily obtained through these types 
of perception cannot be given. This is one kind of cultural 
limitation imposed upon test items. It may be argued that 
this limitation is unimportant because we make little use of 
perceptions from these senses (since they cannot be repre- 
sented—a vicious circle), yet there is undoubted depreciation 
of the validity of the vicarious experience engendered within 
these limitations. 

(4) Non-simultaneous presentation of the pattern of as- 
pects. In reading, images are built up piecemeal, and with 
extremely attenuated impact. Furthermore, the order of pres- 
entation of the components from which the pattern is formed 
is fixed. All three of these factors are artificial and may be 
expected to give the test situation different meaning from that 
of the corresponding “real” situation. 

(5) Selection of aspects. Any symbolic system of repre- 
sentation proceeds by selection of relevant aspects. This 
focuses or fails to focus attention upon details whose impor- 
tance is thus magnified or diminished out of proportion to the 
other details in the situation. 

The above theory is presented to rationalize by means of 
symbolic representation some of the experienced difficulties 
with paper-and-pencil tests. In general, it may be said that 
these tests present artificial situations to which the range of 
kinds of response is limited, and that facility in manipulation 
of verbal symbols is an important factor which masks to some 
unknown degree the nonreading abilities to be measured. The 
use of such tests has led to emphasis upon learning of self- 
contained relationships among symbols rather than upon phe- 
nomenal aspects represented by the symbols—students are 
taught “maps” rather than “territories.” 

It seems reasonable to suppose that instruments dealing 
with experience solely at the verbal symbolic level may, never- 
theless, be of some use in evaluating abilities which are defined 
in terms of behaviors guided largely by verbal maxims or 
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conventions. Although the situations eliciting the behavior 
are subject to the difficulties outlined above, the behaviors in 
these situations may be largely explained as manipulations 
with verbal criteria. 

The behaviors encompassed under the generic title of “crit- 
ical thinking,” because of their high symbolic loading, can 
be appraised much more satisfactorily than can “attitudes.” 
Thus judgments in terms of stated or unstated criteria, or in 
terms of logical rules or scientific methodology may be elicited 
with paper-and-pencil items. The major discrepancy may 
here be found in that the behavior starts after verbalization 
with the verbal items, but before verbalization with the non- 
verbal items. The appraisal is then limited to a part of a 
process rather than to a whole process. If this limitation is 
recognized, however, interpretation may be apparently sound. 

One obvious way to overcome the difficulties inherent in 
a verbal presentation of situations is to place the student in a 
more or less controlled “real” situation and then observe his 
behavior. This sort of technique usually involves individual 
testing of each student by a specially trained observer; the 
results may be expected to be less reliable but more valid. 
Even here, however, the testing situation differs from the 
population of situations about which we wish to make predi- 
cations in that the problems are formulated or at least sug- 
gested by the observer; that is, the tension resulting in prob- 
lem-solving behavior of the testee is stimulated by the ob- 
server rather than by the configuration of naturally occurring 
elements within the situation itself. This technique is ad- 
mittedly cumbersome and expensive. 

The present investigator has become interested in the possi- 
bilities of the sound-slide medium for reducing the loading of 
verbal symbolism and increasing the participation of students 
in testing situations. The test so far constructed will be de- 
scribed and then tentatively evaluated by means of some of 
the concepts stated above. 


2. Description of the Test 


A. Nature of the Instrument. The test to be described 
simultaneously presents a controlled pattern of stimuli approxi- 
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mating “real” situations. To this extent it resembles the con- 
trolled observation technique more than that of the paper-and- 
pencil test. The overt responses of the students are limited 
to recorded judgments, opinions, and analysis, and to this ex- 
tent the test is akin to a paper-and-pencil test rather than an 
observational type of evaluation. 

The particular instrument to be described is concerned 
with the area of behaviors generally referred to as “ability to 
apply principles.” The principles are taken from among those 
ordinarily studied in the fifth-grade physical science units at 
the University of Chicago Laboratory School. 

The test consists of a film strip, a recorded transcription, 
and answer sheets for the students. The test is given in a 
semidarkened room; the light is adequate for students to follow 
the answer sheet and dim enough for the projected pictures to 
be clearly seen. The film strip is projected one frame at a 
time, and the pictures are changed at a signal recorded in the 
transcription. The recording provides some narration for each 
situation, authentic sound effects, and directions for marking 
each item on the answer sheet. Sixteen-inch records are used; 
these are played at 33 1/3 r.p.m. and run for seventeen minutes. 
The film slide strip presents one to five pictures per problem, 
and also contains photographed typewritten titles. Possible 
answers are presented as depicted right and wrong ways of 
doing certain jobs, written explanations or principles, depicted 
members of analogies, depicted illustrations of operation of 
principles, and other devices deemed appropriate to the specific 
objective being tested. A few of the problems require brief 
written statements. 

The present test was given in Grades V, VII, VIII and X. 
The operator stopped the transcription to allow sufficient time 
for all the students to record their answers for all the items. 
The pauses provided in the transcription were approximately 
correct for the tenth-grade students, but had to be lengthened 
for the others. The test required thirty-three minutes in the 
tenth grade and about forty-eight minutes in the fifth. 

B. The Test Items and Objectives. The test presents 
nineteen problems focused on ten specific stated objectives in 
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the area of application of principles in elementary science. 


The following brief description of the objectives and items may 


suggest sorts of possibilities for testing with this medium: 
Objective I. To recognize a practical (unstudied) appli- 
cation of a principle studied in class. 


Problem I: A ruler clamped in a vise is plucked, and the 
sound is heard. Then a shorter ruler is plucked, and the 
sound is heard. The students are asked to consider three de- 
picted ways of getting different notes from a violin: by turn- 
ing a peg, playing open strings, or playing up and down the 
scale. For each of these three situations they record that the 
notes are different for the same reason that the notes from 
the ruler are different, thai the notes are different for some 
reason other than that shown with the ruler, or that there is 
not enough evidence to decide between the alternatives.” 


Objective II. To arrange events in a temporal sequence ac- | 


cording to a developmental principle. 


Problem 2: Three pictures designated A, B, and C show a 
man using a lathe in the construction of a gadget, using me- 
chanical drawing instruments in designing the gadget, and hav- 
ing an inspiration for the gadget. The student writes the 
letters A, B, and C, in the order indicating the sequence as he 
thinks it really occurred. 


Objective III. To recognize (from a studied principle) the 
best technique for solving a simple problem. 


Problem 3: Situation: Grease in a skillet has caught fire. 
Student selects the better depicted method of putting out the 
fire. Choices: Clapping a lid on the skillet; running water 
into the skillet. 


Problem 4: Situation: Balancing of large weight and small 
weights hung from a differential pulley. Choices: Large 
weight hung from smaller diameter; large weight hung from 
larger diameter. 


Problem 5: Situation: Location of tray for fastest freezing in 
the nie unit of a refrigerator. Choices: Lower right-hand 
corner of freezing unit; middle top shelf of freezing unit. 


Problem 6: Situation: Position of head to hear a tuning fork 
loudest. Choices: One ear directed toward the fork; face 
turned toward the fork. 


2 This is an analogy between a studied laboratory situation and an unstudied 
practical experience. The converse objective, recognition of a laboratory setup 
which operates on the same principle as a familiar practical process or device, was 
not tested. 
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In these problems the student selects the better choice, 
selects both choices as being satisfactory, or rejects both 


choices. 
Objective IV. To recognize situations illustrating the opera- 


tion of a stated principle. 


Problem 7: Stated principle: A smaller force can overcome a 
larger force provided it moves farther and faster than the 
larger force. Situations: Boy lifting handles of a wheelbar- 
row; jackscrew raising a heavy load; single fixed pulley sus- 
pending two equal weights. 

Problem 8: Stated principle: Sound is produced by the vi- 
brations of objects. Situations: Block of wood hit with 
ry water poured from pitcher to glass; whistle being 

own.*® 


In these problems the student rates each situation as illus- 
trating the principle, not illustrating the principle, or as in- 
sufficiently described for a decision to be reached. 

Objective V. To identify a simple familiar mechanism or 
process. 


Problem 9: Find the wedge. Situations: Driving a nail; 
sawing wood; pulling a cart up an inclined plane. 
Problem 10: Find the sound being reflected. Situations: 
Boy shouting “around a corner”; man shouting in a large 
empty room; hammer in piano striking a string. 


In these problems the student rates each situation as de- 
picting the mechanism or process, as not depicting the mech- 
anism or process, or as insufficiently described for a decision 
to be reached. 

Objective VI. To compare predicted (from familiar, unstated 
principle) results with observed results in simple laboratory 
situations. 


Problem 11: What is wrong with this picture? Situation: 

China dish is shown being heated by the luminous flame of a 

Bunsen burner. Then the flame is turned off, and the dish 

is seen to be clean. 

Problem 12: What is wrong with this picture? Situation: 

Bimetallic bar is shown to be uncurved in a hot flame, and 

curved after cooling. 

This problem may also be regarded as a sort of logical tautology, since any 
case of sound production must “illustrate” the principle. The tenth-grade students 


were probably sensitive to this aspect, whereas the lower grades distinguished be- 
tween the whistle (a musical instrument) and the other two sources (noisemakers). 
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In these problems the student rates the second picture as 
depicting a correct result, or else explains briefly what is 
incorrect. 

Objective VII. To rate statements of principle or fact as 
useful in explaining depicted phenomena. 


Problem 13: Phenomenon: Whispering is transmitted by a 
garden hose. Statements: “Some solids transmit sound better 
than air does.” “Sound is reflected by surfaces.” “Whisper- 
ing is higher pitched than talking.” “Sound travels outward 
in all directions through a gas.” “Sound is partly absorbed 
at surfaces.” 

Problem 14: Phenomenon: Pitch of a tuning fork is the same 
whether hit hard or softly, whether held in the air or mounted 
on a resonance box. Statements: “Rate of vibration of an 
object depends upon its size.” “Loudness of a sound depends 
upon how much the source vibrates.” “Sound is produced 
by vibrating objects.” “The number of overtones in a sound 
depends upon the construction of the source.” “The rate of 
vibration of a string depends upon its tension.” 

Problem 15: Phenomenon: Man sitting on stepladder in a 
room is warmer than he would be on the floor. Statements: 
“The floor conducts heat more rapidly than the ceiling.” 
“Heat rises because it is lighter than cold. ” “Less dense 
objects float in a more dense liquid or gas.” “A certain weight 
of air occupies more space when it is hot than when it is cold.” 
“Heat is due to moving molecules and therefore hot things 
move more rapidly than cold things.’”* 


In these problems the statements following the description 
of the phenomenon are flashed on the screen one at a time. 
The student rates each statement as being helpful in the ex- 
planation, or as not being helpful in the explanation. 
Objective VIII. To identify an incorrect postulate in the de- 
picted solution of a problem. 


Problem 16: To tune a viola. Depicted solution: The viola 
is tuned aurally, and the position of each peg marked by means 
of ascratch. From then on, the instrument is tuned by turn- 
ing the peg to line up the index and scratch. The experiment 
is tried, and the viola heard to be out of tune. 

Problem 17: To keep cool on a hot day. Depicted solution: 
A boy is shown putting on successively a sweater, a coat, and 


4 The criteria or conventions of explanation tested are concerned with relevance, 
~ gqmmeaa of statement, closeness of analogy, description of mechanism, and the 
e. 
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a blanket. In dialogue with the narrator he explains that 
this will keep the hot air from reaching him, but admits that 
he is still hot. 


In these problems the student writes a brief criticism of 
the solution shown. 
Objective IX. ‘To select the best stated prediction in a prac- 
tical unstudied situation. 


Problem 18: Situation: Taking the nut off a large bolt. A 
pipe has been slipped over the wrench handle, and a man is 
pulling on the pipe. Stated predictions: “The pipe will bend.” 
“The nut will come off.” “The bolt will be twisted in two.” 
“Nothing will happen.” 


The student indicates the prediction he thinks most tenable. 
Objective X. To select the most appropriate opinion (verbal 
expression of attitude) about the desirability or undesirability 
of a depicted situation. 


Problem 19: Situation: A pile of trash and old newspapers 
in the corner of a “dark, warm basement” is shown. Four 
opinions expressing different degrees of alarm over possible 
danger are given. 


The student selects the opinion which he most nearly agrees 
with. 

Problem 3 is depicted in full. The technique of testing 
includes the following steps: 

(1) Presentation of a title or statement designed to gain 
interest, to indicate the general nature of the task, and to 
mark the beginning of a new problem. The title is read by 
the narrator while it is on the screen. 

(2) Description and depiction of the problem situation. 
Pictures are arranged in sequence, and, together with the nar- 
ration, tell a simple story. 

(3) Presentation of answers from which to select. These 
may be depicted ways of doing things, verbal statements 
(which may or may not be read by the narrator, depending 
upon the objective) of explanation or prediction, and the like. 
In some problems the student is asked to write in his explana- 
tion or criticism (short-answer essay). 








44 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(4) Giving of directions by the narrator for answering 
the item. 

(5) Allowance of sufficient time for all students to mark 
the answer sheet. 

Steps 2 and 3 may occur simultaneously. Step 4 may 
precede 3, or even 2. 


Problem 3 


Film Slides (titles Recording (narration, directions, sound effects) 
and pictures) 





Narrator: What is the right way to put out the 
fire in a skillet? 


What is the right way 
to put out the fire 
in a skillet? 


Sounp Errect: Bell signal. 





Narrator: This sort of thing sometimes happens 
when we heat a skillet with grease in it. 


How shall we put out the fire? 
(See Plate 1) 


Sounp Errect: Bell signal. 








Narrator: Is covering it with a lid a good way 
to do it? 


(Pause) 
(See Plate 2) 


Sounp Errect: Bell signal. 
Narrator: Or would it be better to pour water 


on it? 
(Pause) 

In answer space 5, write A if the first way was 
the right way to put out the fire, write B if the 
(See Plate 3) second way was the right way to put out the fire, 
write both A and B if both were correct, or draw 
a dash if neither way was correct. (Pause, stop- 
ping transcription if necessary, until all of class has 
2 marked the answer space.) 














C. Results of Testing. By means of this instrument utiliz- 
ing the sound-slide medium, it was desired to explore a number 
of possible types of items and situations. The brief sampling 
of the various objectives precludes expectation of any high 
degree of reliability with respect to single objectives. Further- 
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more, if the objectives actually are different types of behaviors, 
then the test as a whole would not be expected to have high 
internal reliability. On the other hand, the test could be 
considered valid if predictions guided by an acceptable theory 
of learning and based upon knowledge of the learning experi- 
ences of the students were borne out by the test results. It 
was believed reasonable to suppose that this test, presenting 
the stimuli simultaneously and with a minimum of verbal 
symbolic representation, should have high face validity. To 
test this hypothesis a series of predictions about the results 
of testing were set up and studied. ‘The predictions were: 

(1) Accuracy on the test as a whole will increase with 
the grade level of the students. 

(2) Increase in the appropriateness of responses in par- 
ticular items will spurt between the grade levels above and 
below which the relevant principles were studied. 

(3) There should be some evidence of increasing maturity 
of thought discernible. in the pattern of responses as grade 
level increases. While such patterns have not yet been de- 
scribed adequately, there should be agreement with such frag- 
ments of information as are now available. 

The results bearing upon these three predictions are: 

(1) Median scores: fifth grade 16.8, seventh grade 19.5, 
eighth grade 21.0, tenth grade 23.5.° 

(2) The placement of principles in the science curriculum 
of the Laboratory School has been relatively constant for 
several years, but the courses have been taught by several 
teachers, some of whom are no longer in the school. Knowl- 
edge of the learning experiences of the students was consistent 
with observed spurts in accuracy relative to all the items for 
which such knowledge was available. In other words, the 
test results reflected the learning experiences faithfully so far 
as they could be described. 

5 The large influx of students new to the school in the ninth grade may result 
in a somewhat low median for the tenth grade as compared with the other grades. 
No attempt was made to match the samples of students because there is no good 
reason to suppose that the students in Grades V, VII, and VIII are different in 


any inappropriate dimension. 
6 About half the responses. 
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(3) The evidence concerning the changes in pattern of | 
accuracy from grade to grade is meager and subjective. Re- | 


garded as empirical finding, it would be worthless, but its con- 
sonance with parts of the pattern of anticipations increases 
the validity of the instrument by some unknown amount. 


(a) The test key calls for two responses that “more in- | 


formation is needed for a decision.” In Grade V the 
accuracy was approximately that expected by chance; 
there was a decrease in accuracy up through Grade X. 
This is in agreement with the common observation 


ee 


—— 


from Interpretation of Data Tests that “tendency to | 
go beyond the data” increases with grade level (in | 


the absence of special training). 

(b) Accuracy of rejection of the irrelevant reasons in prob- 
lems 13, 14, and 15 increased markedly between the 
eighth and tenth grades, but it cannot be shown that 
subject matter which may have been presented during 
the ninth grade does not account for the gains. 

(c) The decline in accuracy with principles known to 
have been studied in the fifth or sixth grades and not 
reviewed subsequently appeared to depend upon the 
directness of applicability of the principle as learned. 


3. Criticism and Evaluation of the Medium 


The discussion under “Preliminary Considerations” above 
provides several criteria which may be used in evaluating and 
criticizing the type of test here dealt with. 

Two facts give us some assurance that the testing situation 
is controlled and therefore definable to a high degree: (1) All 
instructions for taking the test are recorded, and this fixes the 
factor which is usually the most variable in the administration 
of tests. (2) The test holds the interest of the students and 
motivates them to work intensively. This is stated as a fact 
as a result of discussions of the test with the classes taking it, 
and as a result of observations of behavior of the students 
while taking it. 

The “realness” of the test situations is greater than with 
paper-and-pencil tests. Consequently it should enable more 
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valid predictions as to the behavior of students in similar “real” 
situations, and this type of prediction is assumed to be the 
most legitimate purpose of achievement testing. The use of 
motion pictures for depicting some of the situations involving 
changes along the time dimension would presumably increase 
the “realness” further (as would also the use of stereoscopic 
pictures and color. Whether this increase would justify the 
increased expenditure of effort in making the test is not known; 
careful analysis of the objectives and situations would enable 
one to set up hypotheses). 

The sound-slide medium very much minimizes the cus- 
tomary use of verbal symbols in conveying the situations; this 
should make possible the evaluation in the lower grade levels 
of some behaviors hitherto not readily available for testing. 
(An illustration is the identification of assumptions in prob- 
lem 16.) The minimization of reading comprehension as a 
prime factor in determining the student’s responses should also 
make possible the testing of many objectives more directly. 

The more complete presentation of situations by picture 
and sound means that the pattern of stimuli comes closer to 
actual experience. Coupled with the advantages listed above 
may be an increased difficulty of “focusing” items so that the 
student does not respond unduly to irrelevant stimuli. In 
other words, the more completely the situation is conveyed, the 
greater the number of possible types of response, and care 
must therefore be exercised in stating the question unam- 
biguously so as to elicit the type of response which is most 
informative in the evaluation of the objective to be appraised. 


4. Plans and Suggested Possibilities 


Other factors being equal, the more adequately a situation 
is presented, the more valid the response. It seems reasonable 
to suppose that this medium may have interesting potenti- 
alities for the appraisal of attitudes. Instead of stating an 
opinion as to preference in verbalized general situations, a 
student might be asked to criticize a depicted course of action, 
or to choose among several depicted solutions to a problem 
involving a conflict in values. Instead of having to select the 
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relevant aspects of a situation for the student (as one must 
do in verbal presentation), it would be possible to present 
subtle but crucial factors disguised in situations. In this 
event the responses of the student might be governed more 
by the values he lives by and less by the slogans he has learned. 


The science department of the Laboratory School is work- | 
ing as a group on the construction of a sound-slide test to | 
appraise ability to form reasonable conclusions. A variety of | 
situations in and out of science will be used in an effort to | 


find out whether this ability can be described as an entity apart 
from associated learnings of subject matter. The identifica- 
tion of a number of such abilities plus the development of 
adequate means of appraisal would make possible some sig- 
nificant research on teaching methods, and might well lead 
to a complete reorganization of the content of elementary 
science courses. 
5. Summary 


A new type of test making use of pictures with synchron- 
ized narrative, sound effects, and instructions is described. 
The use of such a test for appraising some aspects of ability 
to apply elementary principles in science is explored. 

Advantages claimed for the sound-slide test are: (1) uni- 
formity of administration of the test from group to group, 
(2) high motivation of the students, (3) minimization of the 
verbal element with increased validity of testing some objec- 
tives, (4) possibility of appraisal of some fairly sophisticated 
objectives at low-grade levels. 
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ANALYSIS OF THE TERMAN-McNEMAR TESTS 
OF MENTAL ABILITY 


F. T. TYLER 
The University of British Columbia, Vancouver, B. C. 


The Terman Group Test of Mental Ability was probably 
one of the most commonly-used group intelligence tests over 
the period 1921-1941 (8, p. 33). It is likely, therefore, that 
the revision of this test will be of considerable interest to 
school officials. Analysis of the revised form by Terman and 
McNemar should give valuable information to supplement the 
manual of directions. The purpose of this paper is to present 
the results of such an analysis. 


The Subjects 


The subjects were students in the junior high school at 
Nelson, British Columbia, where it is the practice to admin- 
ister group intelligence tests in grades 7 and 9. Approxi- 
mately 100 students took Form D of the TMcN? tests in Sep- 
tember, 1942; forty-nine of these had previously taken the 
KA tests in 1940 in grade 7. The TG test was administered 
to 71 of the grade 9 pupils in October, 1942. Form C of the 
TMcN test was given in February, 1943, to 88 of the grade 9 
students for whom Form D scores were already available. 
Comparisons between I.Q.’s on the various tests are shown 
in Table 1. 

The average TG I.Q. in grade 8 in the Vancouver, B. C., 
school system was 106 in 1940 (12, p. 106), rising to 115 in 
grade 12. It is likely, therefore, that the average I.Q. in 


grade 9 is about 108 or 109. The subjects used in the present 


1“Careful studies of validity and reliability coefficients and norms presented by 
test authors are all too rare” (9, p. 16). 

2 The following abbreviations are used throughout: TG—Terman Group; KA— 
Kuhlmann-Anderson; TMcN—Terman-McNemar; DIQ—deviation I.Q. and RIQ— 
ratio I.Q. computed from TMeN tests. 
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study appear to constitute a typical sample, although possibly 
slightly above average. It should be noted from the table 
that the average DIQ of 88 cases is three or four points below 
the average for 49 cases. It seems reasonable to expect, there- 
fore, that the average KA and TG I.Q.’s for the whole 88 cases 
would be somewhat lower than those for only 49 students, so 
that the average TG I.Q. approximates that found in the 
Vancouver schools. 

Despite the apparent differences between RIQ’s and DIQ’s, 
the correlations between them are very high, being .92 and .94 
for Forms C and D, respectively, practically identical with the 
relationship given in the manual of directions, namely, .92 for 
Form C. As the authors state: “From these data it will be 


TABLE 1 
Means and Standard Deviations of 1.Q.’s on Various Tests 








Form D Form D Form C 
RIQ DIQ RIQ DIQ RIQ DIQ 
_ 1940 1942 1942 1942 1942 1942 1943 1943 


KA TG 











49 49 49 49 88 88 88 88 
M 109 113 122 113 115 109 116 110 
6 11.30 10.65 20.62 11.65 19.73 13.40 18.02 13.08 
o Manual of directions 29.10 17.10 





seen that the rank order of deviation and ratio I.Q.’s is very 
nearly identical, but that the magnitude of the I.Q.’s will vary 
in increasing amount as one moves away from the mean” (11, 
p. 10). This also accounts for the fact that the mean RIQ 
of the present sample is larger than the mean DIQ. The dif- 
ference in value between a student’s two I.Q.’s need not concern 
the teacher if he understands that a difference is to be expected 
because of the differences in the standard deviations of the two 
types of scores. The manual of directions might have been 
more explicit on this point for the benefit of those teachers 
who are relatively unfamiliar with the meaning of a standard 
deviation. The authors recommend the use of the DIQ, but 
the RIQ will be used in many schools because teachers are 
more familiar with its definition. 
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Findings 
1. Correlations between Various I.Q.’s. 
Table 2 shows the correlations between the various I.Q.’s 
obtained in the present study. 


TABLE 2 


Correlations between I.Q.’s 











Variables F gest ) N r® 
EE eo PA a sig via eayskonoiacsleieie go ees 2 55 .79 
MEI yc Se ve woes Su Sinaia sicienieie 2 68 .78 
I os ores aca heave va ona aleaeaereeee's 2 68 86 
NO xcs OG ion a wes ee ON eee Ree 2 71 AT 
NMMMAMNNNOY 05656 shite al pPtox dhs Wisi g uaraseiavoveye oder 2 71 84 





* All correlations are statistically significant. 
+ Form D was used in this part of the analysis. 


The correlations between I.Q.’s are very similar, none of 
the differences being significant. The TMcN I.Q.’s agree as well 
with the other tests as the others agree with one another. The 
coefficients are similar to those usually reported between group 
intelligence tests. 


2. Difficulty of the Tests. 
Means and standard deviations of various scores are shown 


in Table 3. 
TABLE 3 


Means and Sigmas of Scores on Tests and Subtests 

















Form C Form D 
Subtest 
M o M o 

1 18.0 3.61 16.9 Be | 

2 11.6 5.18 11.9 4.80 

3 16.3 3.60 15.1 4.24 

4 17.9 2.60 17.2 3.21 

5 17.3 4.05 16.3 3.85 

6 12.8 4.02 11.8 4.07 

7 9.9 1.85 8.7 237 

BN, 5 es25 ox Vokes 103.5 18.80 97.8 19.70 
cms aha wes ce awe 17.5 2.44 16.9 2.51 
Standard scores ....... 116 11.79 114 12.31 





It may be seen that the two forms are distinctly comparable 
in difficulty, and variability. The average percentages of all 
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items passed are 64 and 59 for Forms C and D, respectively. 
The authors report average difficulty values of about 56 per 
cent for grades 7, 9, and 11. The higher average per cent 
success on Form C than on Form D for the present sample is 
explainable in terms of growth, with possibly some practice 
effect. 

The subtests vary considerably in mean difficulty and vari- 
ability, subtests two and six being significantly more difficult 
than the others. 


3. Item Difficulty. 


Form D was analyzed to determine the range of difficulty 
values of each item. These are shown in Table 4. 


TABLE 4 


Range of Percentage Success by Items 








Range of per cent Per cent of items between 





Subtest success 40 and 59 per cent 
1 18-98 16 
2 4-92 24 
3 16-98 28 
4 12-100 16 
5 8-98 4 
6 14-95 16 


59-96 0 





With the exception of test 7, the range of success varies 
from a low to a high percentage in each subtest, a situation 
usually associated with maximum reliability (4, p. 32). On 
the other hand Symonds (10) and T. G. Thurstone (15) have 
shown that a test consisting of items of fifty per cent difficulty 
value measure an individual most accurately. Comparatively 
few of the items on this test fall within the range 40 to 59 per 
cent difficulty value. 

The authors believe that the test is essentially a power 
test, i.e., that the items have been arranged within each subtest 
in increasing order of difficulty with ample time limits. This 
claim was appraised by computing the rank order correlations 
between obtained order and test order in the subtests of Form 
D. These are given in Table 5. 
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TABLE 5 
Rho between Obtained and Test Order of Difficulty 











Subtest tho* 
1 79 
2 85 
3 87 
4 91 
5 79 
6 86 
7 69 





* All values of inferred r are significant. 


The values of these correlations indicate that essentially 
the items are arranged in order of difficulty. Despite this, 
item analysis indicates that for the Canadian sample some 
items are very seriously misplaced. These results may be 
compared with those of Hovland and Wonderlic, who report 
rank order correlations between test order and obtained order 
of .46 to .75 in various forms of the Otis Self-Administering 
Test, Advanced Form (6). 

There is no definite way of knowing which items a student 
tried, but for purposes of this analysis it was assumed that a 
student attempted all items down to the last one he marked. 
Table 6 shows the percentages of students who marked the 
last item in each subtest, i.e., the percentages who attempted 


all items. 
TABLE 6 


Percentages of Students Attempting All Items 











Subtest %s 
1 79 
2 84 
3 77 
4 96 
5 74 
6 92 
7 88 





Evidently the test is essentially a power test, since such 
large numbers of subjects were able to try all items in each 
subtest. 

4. Suitability of the Test at the Grade 9 Level. 


The fact that about 60 per cent of all items were success- 
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fully passed suggests that the test might be too easy at the 
grade 9 level. Table 7 shows the percentages of subjects who 
obtained mental ages of 19 and over, and 20 and over, on 
each form. 

Since the tests fail to discriminate between the mental ages 
of such a large percentage of students, and because so many 
earned the maximum mental age, it seems reasonable to con- 
clude that the tests are too easy at the grade 9 level, and pos- 
sibly even for bright students in grade 8. This test apparently 
suffers from the same weakness as did the TG test: “As Terman 
points out, a child capable of earning a score of 180 or better 
is under a handicap” (1, p. 157). A 12-year-old student may 


TABLE 7 
Percentages of Students with M.A.’s of 19 and 20 














Ferm C Form D 
M.A. 
No. JA No. o 
19 and over ....... 32 36 25 25 
20 and over ....... 22 25 14 16 





earn a DIQ of 161, whereas the highest DIQ obtainable by an 
18-year-old is 138 (11, Table 3). DIQ’s are probably more 
satisfactory than are RIQ’s, but the test appears to be too easy 
for students above grade 8. This should be verified by an 
analysis of the test results of grade 11 students. 


5. Reliability of Tests and Subtests. 

Reliability coefficients were determined by correlating 
scores on the equivalent forms. 

The inter-form reliabilities of the subtests vary rather con- 
siderably, being .40 and .84 for subtests 7 and 2, respectively. 
Averaging these coefficients for the seven subtests and pre- 
dicting the reliability coefficient for a test seven times as long 
(2, p. 283) gives an estimated reliability coefficient of .93, as 
compared with the obtained correlation of .94. 

The correlations in the lower part of the table indicate the 
necessity of stating the reliabilities of all measures which teach- 
ers may use, since the various types of scores are not necessarily 
equally reliable (7, pp. 122-3). 
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TABLE 8 
Reliability Coefficients 














Variables Equivalent forms 
Subtest 1 52 
2 84 
3 .78 
4 58 
5 76 
6 69 
7 40 
BOL FAW SCOPE oc 006s voce ccirewewies 94 
RERMGUIE OCOTE. 6 .o.5.6.0.6.4 0:c.0.0000 000 000.8% 90 
MRE vic sob a's ovis os sae awe es 91 
LL) CORSE ae rr cere pera eran .89 
Ree caiancissisle Vea Sicaeals ae wicbicee es 93 
I ohn cae ci watint wea agaceses 96 





* This is apparently based on raw scores for the age range 13-6 to 14—5, although 
the manual does not make this clear. 


Probable errors of measurement are given for certain types 
of scores: (a) for standard scores P.E.y = 2.6, compared with 
2.2 reported in the manual; (b) for DIQ’s: P.E.y = 3.06; (c) 
for RIQ’s: P.E.y = 3.45. 


Factor Analysis 


In the manual of directions the authors state that they 
have chosen the content in such a way as to “have a test more 
highly saturated with a common factor or ability” (10, p. 1). 
In revising the TG test, for example, they eliminated those 
subtests which appeared to measure a numerical ability, so 
that the present revision is thought to measure “general verbal 
intelligence” (11, p. 1). While, of course, the number of sub- 
tests is probably too small and the reliabilities somewhat in- 


TABLE 9 


Intercorrelations (Form C in upper, Form D in lower part) 











Subtest 1 2 a 4 5 6 7 
1 Si 50 55 50 45 36 
2 53 67 50 63 81 56 
3 45 64 58 46 67 48 
4 55 50 A8 Al 56 49 
5 33 43 71 71 55 53 
49 85 63 67 64 55 


40 48 











56 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


adequate to give a satisfactory indication of the factor loadings 
obtainable from these tests, a factor analysis might give some 
indication of the extent of the general factor. Table 9 shows 
the intercorrelations, those for Form C being above and those 
for Form D below the diagonal. 

With few exceptions the intercorrelations for the two forms 
are of about the same magnitude. The first factor loadings 
and the communalities for each form were computed by the 
multiple-factor method (14). The obtained communalities 
varied somewhat from the first estimated communalities, which 
were taken to be the highest 7 in each column. This is, of 


TABLE 10 
Factor Loadings and Communalities for Each Form 

















Form C Form D 

Subtest Ist App. 2nd App. Ist App. 2nd App. 
I h2 I h2 I h2 I h2 

1 67 45 65 43 59 34 56 31 

2 87 76 87 76 .82 67 80 64 

3 7 59 76 58 .80 64 80 64 

4 70 49 69 48 a 59 76 58 

5 71 50 69 48 wi 59 76 58 

6 84 71 83 69 89 78 89 79 

7 68 46 66 44 59 35 56 32 





course, to be expected with such a small battery of tests. A 
second approximation was made in each case. Only one factor 
loading was computed since the correlations in the first residual 
matrix were all less than 4 times the probable error of the 
corresponding original correlations, making further analysis 
unnecessary (14, p. 26). 

The results of the analysis are shown in Table 10. 

It appears that little was gained by making the second 
approximation since practically identical factor loadings were 
obtained on both approximations. The factor loadings are 
very similar for forms C and D. In general, subtests 1 and 
7 are less saturated with the common factor than is the case 
of the other subtests. This was verified by a cluster analysis 
(17), and also by the calculation of B-coefficients (5). 

Since the subtests vary in their reliabilities and in their 
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factor loadings, the question of the possibility of shortening 
the test without loss arose. Subtests 2, 3, and 6 have the 
highest reliabilities, and the highest first factor loading. Sub- 
tests 4 and 5 have identical factor loadings but the former is 
less reliable than the latter. Possibly a combination of sub- 
tests 2, 3, 5, and 6 would give satisfactory results. The inter- 
form correlation of scores on these four subtests was found to 
be .92, almost as high as the reliability coefficient of total raw 
scores. The use of these four subtests would reduce testing 
time from 48 to 29 minutes, a saving of 40 per cent. 


Conclusion 


In general, the results of this analysis are very similar to the 
data reported in the manual of directions, with the criticism 
that the test may be too easy at the grade 9 level since it fails 
to discriminate between the mental ages of about 20 per cent 
of the present sample. The suggestion is made that the test 
could be considerably reduced in content with little loss in 
reliability. 
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THE ROLE OF TESTS IN THE DIAGNOSIS AND 
CORRECTION OF SPELLING DEFICIEN- 
CIES OF COLLEGE STUDENTS 


FRANCES ORALIND TRIGGS 


American Nurses’ Association 


The Problem 


AN EXAMINATION of the literature would seem to indicate 
that the college student who is a poor speller has received little 
encouragement to do anything about improving his spelling 
skills. This is in contrast to the encouragement given the 
college student who is a poor reader through remedial classes 
and clinics. The complete explanation for this situation is not 
clear. However, the following closely related observations 
may partially account for it: 

1. Scientific study and diagnosis of spelling difficulties have 

lagged behind comparable work in reading; 

2. No clear-cut and easily applicable remedial techniques 
in spelling have been available; 

3. Teachers of college students are convinced that, if a 
student were ever going to learn to spell, he would have 
done so by the time he reached college. 

There is growing evidence, however, that reading and spell- 
ing, to say nothing of other language skills, are closely related 
and that actually much can be done to remedy deficiencies in 
them even at the college level. 

To that end, a remedial spelling program was set up at 
the University of Illinois during the academic year 1942-43. 
Remedial techniques were sought for this program which would 
require students not only to read about spelling, but also to 
have the experience of applying the principles studied. It was 
felt that by using such techniques, there was some assurance 
59 
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that the students could more easily apply the skills both in 
and out of their class work. 

For these reasons, a manual of exercises was the technique 
chosen. The spelling manual devised consists, first, of a dis- 
cussion of spelling in general, of the ways by which spelling is 


GORE SHE: 


learned, and of the types of skills involved in spelling; second, | 
of a discussion of the principles of pronunciation with emphasis | 


on those especially applicable in aiding spelling; third, of a © 


discussion of word families; and fourth of a series of “spelling | 
conventions” which help the students to see the system behind | 


the spelling of many words. 


Answers to two main questions were sought from the reme- | 


dial spelling program: first, is it possible to improve spelling 


skills of college students by use of spelling exercises which | 


require the student not only to study the principles of good 
spelling but also to apply them; and second, what kinds of 
skills and abilities must students have who may be expected 
to improve through this remedial technique, i.e., what back- 
ground is necessary on which to build spelling skills by use of 
such a technique? 

Procedure 


Announcements were made in Rhetoric I and II notifying 
students that they could apply for work in the remedial spell- 
ing classes. One hundred forty-nine students applied, of whom 
one hundred were accepted in the first remedial sections opened. 
Approximately seventy students appeared at the first meetings 
of the classes. The work was carefully explained during this 
first session. Students were told that they would be required 
to do the assigned work and do it regularly, if they were going 
to attend the sessions. It was expected that every student 
would attend classes regularly once a week and spend at least 
two hours each week in preparation of manual exercises. 
Students were urged to come to the instructor’s office for special 
help previous to any class period if they had difficulty doing 
their assignments. 

Approximately twenty students did not return after this 
session, leaving about fifty in the four sections. Of these fifty, 
twenty-two were called out with the Emergency Reserve Corps. 
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Thus only about twenty-eight remained in the class long 
enough to complete the work. 

Shortly before the first spelling sessions were over, a second 
course was arranged to accommodate those students who had 
not originally been accepted. This time some twenty students 
attended the first session, and about fourteen remained after 
the students understood the work which would be involved. 
Because of late applications, still another section was opened 
in which the students had to do twice as much work a week 
as had been planned originally. There were only about three 
who were able to do this. Test-retest evidence did not indicate 
that these students were handicapped by having to work at 
greater speed." 

Certain objective test data were available on these stu- 
dents. Scores were available on the American Council on 
Education Psychological Examination. This is a scholastic 
aptitude test having two types of scores, “L,” and “Q.” The 
L-score purports to be indicative of language facility and re- 
lated to the student’s ability to do work requiring this type of 
facility, such as course work in English, foreign language, and 
social sciences. ‘The Q-score purports to be indicative of the 
student’s facility to do work requiring quantitative thinking 
such as is required in science and mathematics. 

In addition to these scores, scores on four other tests were 
available: an informal spelling test of the dictation type, one 
of the recognition type, the Minnesota Clerical Test, and a 
phonics test. The recognition spelling test given was the spell- 
ing section from the Cooperative English Test, Form O, testing 
ability to recognize which of several spellings of a word is 
correct. The Minnesota Clerical Test has two sections, one 
on names and one on numbers. The numbers section consists 
of columns of numbers in pairs. If the pair is exactly the same, 
the subject is to check it. The names section of the tes. is 
similar. This test is closely timed and thus requires both 
speed and accuracy. The phonics test has two parts. Part I 
tests the student’s ability to divide words into syllables; Part 


II tests his ability to sound words according to a somewhat 


1No check on extent of comparability of these two groups was made. 
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simplified arrangement of the usual dictionary key to pro- 
nunciation. 

Scores from these tests and from certain clinical data indi- 
cated to some extent the types of difficulties which individual 
students had. There were those students who had good basic 
language abilities and skills as shown by a high score on the 
“L” of the American Council on Education Psychological 
Examination, names of the Minnesota Clerical, and the usage 
and vocabulary subtests on the English test but low scores 
on the spelling subtest and the phonics tests. The problem in 
such a case seemed to be to make the student aware of the 
need for accurate spelling and to show him how to apply his 
skills by any of several techniques. 

The tests also revealed those students who had a potential 
facility in language, but who had never developed language 
skills. There were also those students who probably do not 
have the general ability and potentialities to develop the 
language skills necessary to succeed in college. 

Students attended class for eight weeks for one hour a 
week. Each student was given in dittoed form spelling exer- 
cises from the manual described earlier (Frances Oralind Triggs 
and Edwin Robbins, Improve Y our Spelling, New York: Farrar 
and Rinehart, 1944). Individual conferences with students 
allowed the instructor to individualize somewhat the work in 
the manual to fit student needs as shown by the informal diag- 
noses made from the type of work done both in and out of class. 

Those students who cared to take retests were given dif- 
ferent forms of the same tests which they had taken at the 
beginning of the work. These tests were then interpreted for 
them in individual conferences. There are two types of inter- 
pretation which can be made from such retests: interpretations 
which apply to individuals only and interpretations applying 
to the group as a whole. Individual interpretations are mainly 
of value in guiding the further growth of the student, and are 
made on the basis of both an intimate knowledge of that stu- 
dent and experience with the group as a whole. Group in- 
terpretations show general trends which result from remedial 
work. Both serve as a basis for evaluation and modification 
of procedures. 
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Results of Remedial Work 


The dictation spelling test was built to illustrate the prin- 
ciples discussed in the manual. The students at no time 
studied the specific words in the test. Test-retest evidence 
indicates a range from a gain over the remedial period of ten 
words to a loss of one word, with a mean gain of 3.6 words. 
It is probable that results from this type of test most nearly 
reflect the ability of the student to do the spelling task usually 
required of him. 

Test-retest evidence on the clerical test was interesting in 
that there was a greater gain on the names section than on 
the numbers section. The range of gain on the names part 
was from 36 to 0, with a mean gain of 17 words. The range 
of gain on the numbers section was from 33 to minus 14, with 
a mean gain of 13 items. This group of students originally 
had markedly higher scores on the numbers section of the test 
than they had on the names section. On the retests, this 
difference was not so evident. Gains on this test probably 
indicate an improvement in ability to look within the word 
and recognize word parts rather than in ability to recognize 
the word only by its configuration. It is this type of skill 
which is used in proofreading and in reading where it is neces- 
sary to distinguish between words of like configuration such as 
“physiology” and “psychology,” “insulation” and “installa- 
tion,’ and in many cases such simple words as “then” and 
“than,” “also” and “solo,” and others. This type of skill prob- 
ably should not be over-emphasized because it might adversely 
affect reading skills. However, a balance between work of this 
kind and work on skills required in normal silent reading will 
probably result in improvement in both reading and spelling. 

Gains were also evident on the phonics test. On the syl- 
labification section, Part I, the mean gain was ten words, with 
a range of from 22 to minus two. On Part II of this test, the 
ability to sound words, the range of gain was from 27 to minus 
four, with a mean gain of 11 words. A gain on this test, when 
accompanied with gain on a spelling test, suggests that stu- 
dents not only have learned the tools of word recognition but 
also are beginning to apply them. When these same skills 
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were measured by oral reading, it became even more evident 
that students not only had learned them but actually were 
putting them into practice. 


The Reaction of the Students to Remedial Work 


It was interesting to note the reasons students gave for 
registering in this course. In terms of the stated motives of 
the students, they might be classified as follows: First, there 
were the students who were merely curious to know what the 
work would be like, but who did not care to put time on 
remedial work. Second, there were the sincere students who 
wanted to improve their spelling skills, but who actually did 
not have the time available to work through the manual. 
Many of these students were carrying heavy schedules besides 
actual work to help finance their education. This type of 
student is the one who is most severely handicapped by poor 
verbal skills. Our university curriculum requires a great deal 
of verbal work, yet it takes these students who have poor verbal 
skills longer to do the work; therefore, they do not have the 
time to put on the remedial work, and the longer they spend 
on their class work, the less chance there is that they will be 
able to put in the extra time on improving their skills. This 
is an illustration, surely, of the old saying “them that has, gets.” 
Third, there was a group of very sincere students who had 
time to do the work, and who did excellent, consistent class 
work. Some of these students were handicapped by poor 
scholastic aptitude and did not gain as much in the end as 
their efforts warranted; but most of this group made excellent 
improvement as measured by both daily written work required 
in their courses and by standardized tests. 

At the four weeks’ point in the remedial work, to remind 
the students of the importance of consciously trying to transfer 
skills learned in their remedial work to class work, the in- 
structor asked the students to write during class time an in- 
formal five-minute essay, expressing their reaction to the re- 
medial work, and indicating whether they had been able to 
notice any improvement in their spelling up to that time. A 
number of the reactions, written both at this time and later, 
are given below. 
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The real purpose of this letter is to thank you for your help. 
The value of your spelling course showed itself clearly in my 
last theme for Verbal Expression. Though it was one of the 
longest compositions, it contained fewer errors than any pre- 
ceding paper. I misspelled only two or three words. It 
seemed almost unbelievable to turn page after page without 
an error. 

It will, as you said, be some time before I am able to realize 
the full benefit of your instructions. But already I can print 
legibly and at a reasonable speed. My spelling is improving, 
and one may see something in my way of doing things which 
resembles organization. My enunciation (thanks to your 
advice to visit the speech clinic) has shown some improve- 
ment. It will continue to develop since now I have the rudi- 
ments and need only practice. 

All these things you’ve done for me against my own objec- 
tions. It would have been easy for you to let me go when 
I was determined to give up. It was some time before I could 
appreciate this work of yours. Now I can see what it has done 
and will do, so I want to apologize for my lack of character, 
and thank you for all you’ve done for me. 





I have been a student of the experimental remedial spell- 
ing course for the past four weeks. In that time there has 
been a slow transition of confidence within me in all phases 
of handling and working with the English language. This 
change may not be outwardly apparent at this present 
moment, but I’m sure time will bear out that there is a 
definite improvement in this respect. 

My one regret, in regard to this course, is that it is of only 
eight weeks in length. 


From remedial spelling I have received an improvement 
in spelling. I have never studied related words before or paid 
much attention to the way the words were pronounced. 
These simple things have aided my spelling. Before I took 
this course I never thought of the different ways of spelling 
words—hand, ear, etc. 





When I started to the University of Illinois I was very 
weak in spelling. In fact I don’t think I could have been 
much worse. It seemed that I just couldn’t learn to spell. 
I couldn’t find out what was the matter. I was offered a 
chance to take this extra spelling course to improve my spell- 
ing. I was very much pleased with the chance, so I enrolled 
in the course. I have just finished four weeks of the eight- 
weeks course and I am beginning to see more closely some 
of the basic fundamentals of spelling which I had completely 
missed before. i can’t say after four sessions that I am an 
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outstanding speller, but I do believe that I will be a better 
speller after I have finished the course. 





I know that the four hours which I have spent in spell- 
ing class have helped me a great deal. I have been doing 
better work in my regular English class and the letters which 
I have sent home have improved. 

I still am a very poor speller; however, I am able to find 
some of my faults. I believe before the spelling classes are 
over my spelling will improve a great deal more than during 
the first four weeks. 

I believe these spelling classes should be given next year 
so other students may also have a chance to improve their 
spelling. 


I believe that the help I am getting from our spelling 
class will not only help me to overcome spelling troubles, but 
it will be a great help in obtaining exactness with all my other 
work as well. In fact I have already been helped by the 
principal parts which we have taken up, mainly forming a 
picture of the word I am hunting for. Yesterday, for ex- 
ample, I had to write a theme about myself while I was in 
the process of being sworn in as a Naval Cadet, and I was 
bothered with the spelling of a couple of words I chose to use. 
My sight spelling came to my rescue, and I was able to do a 
decent piece of work on my theme. This is only one instance 
that I remember because it was so recent and much depended 
on it. 


I'll admit that after the first few classes of remedial spell- 
ing, and after seeing the long and seemingly difficult assign- 
ments I was disgusted with myself for enrolling. I had always 
told myself that I was almost infallible in spelling, but my 
mother was very disgusted about the lack of phonics in our 
grade-school system and insisted that I was a poor speller. 
Spelling came easy for me and I imagined that remedial spell- 
ing in college would be one continual spelling match, and they 
are fun. However, I found that the accuracy the work re- 
quires is helping me in many ways. I’ve discovered that there 
are many facts about spelling I had never thought about. I 
believe that this remedial work should be included in college 
Rhetoric and English sections, because many freshmen, newly 
graduated from high school, lack the fundamentals, training, 
and background to spell correctly, and the thoroughness of the 
work and assignments will aid in every course. 


Reactions of the Faculty to Remedial Work 
The faculty of the English Department was, at all times, 


aware of what was being done in the remedial spelling work. 
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They referred students to it, and twice the work of the re- 
medial spelling classes was described at English staff meetings. 
The cooperation of the faculty with the instructor in remedial 
spelling was excellent. There are many indications that the 
instructors welcomed the special help given these students. 
Many of them felt that they had very little time to give in- 
dividualized help in spelling, but that if such could be done, 
the results would be worth while. Certain comments of the 
faculty on individual cases are given below. 


Thank you for your note about Mr. X. He has spoken 
to me of the excellent help you have given him with his 
handwriting and with his spelling. I am greatly pleased with 
his progress, and with his attitude toward you personally. 
I shall be referring students to you in the future, urging them 
to take advantage of the opportunity of following your sug- 
gestions. 


Your course in remedial spelling has been of considerable 
help to my student, Jack Doe. Originally, he was by no 
means a hopeless speller; but his spelling was bad enough to 
handicap him in his work. The carelessness and the word- 
ignorance which caused many of his errors have been checked, 
I think, by the work he has done with you. On his themes, 
at least, he has shown an increasing awareness of the necessity 
of correct spelling. Part of his improvement has come, no 
doubt, from his general development in language skills as a 
whole, through his work in Rhetoric, and from his own in- 
tellectual and social growth; but your work with his spelling 
has unquestionably given him valuable help with that par- 
ticular aspect of his training. 

Mr. Doe has not, of course, been suddenly transformed into 
a perfect speller. That is too much to expect. But he has 
developed an interest in words themselves and has come to 
realize the importance of thinking while spelling. It is this 
new attitude, I think, which will have the most bearing on his 
continued improvement in spelling. 

If other students have gained from their work in remedial 
spelling as much as Mr. Dee has gained, I think the course 
certainly should be continued for the benefit of future stu- 
dents. 


You asked what results your spelling class had on my 
student, Mr. X. 

To begin with, his spelling was very bad, though largely, 
I think, through carelessness. Almost at once his home 
themes showed great improvement as he became more con- 
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scious of his needs, and by the end of the semester, he rarely 
missed more than one or two—usually easy—words in each 
of them. 

If that improvement lasts, and if your other students did 
as weil, I should certainly want to see the class continued. 


I am pleased with the progress made in spelling by Miss 
Blank and Mr. Long. The spelling grades alone do not evalu- 
ate the counsel and assistance you have given these students. 
Your remedial spelling is a worth-while pees and should 
be continued. 


Conclusion 


The major generalizations to be drawn from this study is 
that poor spellers can improve their spelling skills by a re- 
medial technique such as has been described. This generali- 
zation can be made more specific by some further comments. 

There are rather complete records for ninety of the students 
of this group. A study of these records indicates the impor- 
tance of careful attention to the reasons for the spelling difh- 
culties. For instance, twenty-six students had poor spelling 
skills mainly because of carelessness, lack of the habit of proof- 
reading what had been written, and, in general, an attitude 
that spelling is unimportant. Sixty-four students, however, 
lacked at least some of the following skills: They could not 
divide words into syllables, nor could they accent words cor- 
rectly. They had very little knowledge of the construction of 
words—that is, they did not know what suffixes and prefixes 
were. They did not realize what base words or root words 
were—and when reading orally they miscalled words of like 
configuration. Thus it became evident that they had no 
methods for attacking new words. They also had little knowl- 
edge of spelling “conventions.” Many of this group were not 
only poor spellers, but poor readers; and many of them had 
poor English skills as measured by the objective test given 
them at the béginning of the year and by subsequent class 
work. 

On examining these records, it is possible to make a prog- 
nosis of the extent of success of these students as the result 
of remedial spelling work if general ability is taken into ac- 
count. In this regard, it might be said in general that if 
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there is some indication of measured general ability and if 
remedial work follows a careful diagnosis of difficulties, the 
prognosis of success in remedial work will be good, assuming 
the student applies himself assiduously. But for the student 
who does not have measured general ability, successful results 
cannot be universally predicted. However, it is always pos- 
sible that a student’s poor scores on the general ability test 
may be due to lack of development of language skills. If 
there is time available, an individual ability test can be used 
to determine to what extent the student is penalized by the 
form of the test given. If on the basis of an individual test 
potential ability is evident and if plenty of time is available 
for remedial work, satisfactory results may be forthcoming. 

Probably the major error made in this remedial program 
was that it was placed, for most students, on top of an already 
over-full schedule. Requirements of the remedial program 
were heavy. These students are already the ones who have 
to spend the most time in the preparation of their courses 
because of lack of verbal facility, which is a greatly needed 
tool throughout the university curriculum. 


Recommendations 


On the basis of experience with this remedial program, it 
is recommended, first, that the students who are poor spellers 
be segregated and their records examined at the very begin- 
ning of the school year; second, that the reason for this dis- 
ability be determined in each individual case; third, that a 
stated requirement be made of these students if they are to 
pass English; and fourth, that a special place in the curriculum 
be given for remedial training as may be required. If the 
student’s disability is great enough, his whole program should 
be lightened to allow time enough to do the remedial work, 
and do it well. It has been found time and again that, where 
such an approach is taken, the student’s improvement is appar- 
ent not only in spelling but in other language skills as well, 
and that this improvement is carried over into his course work. 

One further observation should be made. The motivation 
of the student is a major factor in the degree of success he will 
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have in any type of remedial work but should probably receive 
special consideration in the remedial spelling program. The 
extent to which it is important for an individual to follow 
spelling conventions will probably be a determining factor in 
his motivation for remedial work in spelling. Though the 
clinician or instructor working with him may realize that prob- 
ably no strict line of demarcation exists between reading, 
spelling, and other language skills, it may be difficult to con- 
vince the student of this fact. If he is aware only of his 
spelling disability and has “gotten by” this long, it may be 
somewhat difficult to convince him that he cannot always “get 
by” with no handicap to himself. It is therefore recommended 
that the well-motivated students, as well as the students for 
whom prognosis of success in remedial spelling is good, be the 
ones to receive attention first, at least while remedial tech- 
niques are being evaluated. 

There is always the question of how much responsibility 
the university can take in developing sub-college English, 
spelling, and reading skills. This, of course, is a matter of 
policy to be set by the school in question. However, it is 
suggested that, if it is possible to demonstrate that spelling can 
be taught at the college level, the public schools may be helped 
to realize that it can also be taught at the lower educational 
levels. They may then take over the responsibility at that 
level and relieve the college of the necessity of worrying 
about it. 
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DISCRIMINATIVE VALUE AND PATTERNS OF 
THE WECHSLER-BELLEVUE SCALES IN 
THE EXAMINATION OF DELINQUENT 

NEGRO BOYS 


JOSEPH CHARLES FRANKLIN? 
Civilian Public Service # 115, Laboratory of Physiological Hygiene, 
University of Minnesota 

Tue psychologist working with delinquents in an institu- 
tional setting is obliged usually to maximize the validity and 
utility of his findings in individual case and group studies with 
the least expenditure of time, energy, and resources. Conse- 
quently, he is most likely to turn to the supply of available 
tests and, applying criteria growing out of his purposes and 
determining test “goodness” in relation to prospective testees, 
to select those test materials which are most easily admin- 
istered, scored, and interpreted. 

In intellective measurement the use of tests on subjects 
differing from the standardization populations from which the 
norms derive, in one or more significant variables, involves 
concern with the attainment of valid and meaningful 
measurement. 

The Cheltenham School for Boys, Cheltenham, Maryland 
is a State Institution for delinquent Negro boys. The back- 
ground of the boys committed is commonly one of social and/or 
personal maladjustment. Their previous life conditions are 
marked by broken homes, inadequate familial organization and 
integration, poor supervision, and neglect. The incidence, 
variously, of sub-standard shelter, poverty, lack of medical 
care, and even malnutrition, is preponderant. These children 
are seriously retarded educationally, approximately at the 

1The writer is indebted to Donald L. Grummon for cooperation in the ad- 
ministration of the Scales and to Charles W. Piersol for assistance in the completion 


of the study. 
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third-grade level, at the chronological age of fourteen and a 
half; and truancy, suspension and expulsion from school, and 
consistent failure largely characterize their formal education. 

At the practical level of mental testing of such subjects, 
awareness of and consideration for the implications of the state 
of psychological knowledge in such areas of theoretical research 
as the following are of methodological and evaluative impor- 
tance: nature and nurture, race and nationality differences, 
rural and urban effects, equality of normal opportunity for 
socially, educationally, and intellectually stimulating experience 
or the lack of it, the fixity or flexibility of mental capacities, 
and the organization of mental abilities. 

It is beyond the scope of this study to discuss the relation- 
ships between the conflicting conclusions of research in these 
fundamental problems and the construction and use and in- 
terpretation of obtained results in the mental measurement of 
Negroes. Highly useful references are provided in_bibli- 
ographies compiled by Bean (1) and the editors of the Journal 
of Negro Education (14). 

Nevertheless, keeping the relevance of the basic issues in 
mind serves two worth-while purposes. First, survey of avail- 
able tests reveals the inadequacies of existing materials with 
resultant difficulty in selecting a “good” test (particularly with 
regard to standardization and norms) for Negroes, much less 
delinquent Negro children. Secondly, the need in intellective 
measurement is observed to be shifting from simple over-all 
characterization of mental status to intra- and inter-individual 
comparisons of partialled-out components or aspects of in- 
tellective functions. It becomes obvious that in these terms 
tests easily administered, quickly scored, readily interpretable, 
and suitable to. our subjects are not available. 

Preliminary use and appraisal of various group and indi- 
vidual mental tests were made. It was found that the Wechs- 
ler-Bellevue A €§ A Scales provided maximally useful informa- 
tion regarding mental status and facilitated needed qualifica- 
tion of test results with respect to the fundamental problems 
already mentioned. 

Wechsler did not include Negroes in his standardization 
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and specifically urges caution in the use of his test with non- 
whites. Nevertheless, the apparent and distinct advantages 
of the Wechsler-Bellevue Scales in classification, insofar as 
classification depends upon mental status, warranted further 
use and study of the test. The bias of the standardization as 
related to this minority group is a serious incongruity but sub- 
stantially no greater than that involved in the use of other tests 
the results of which did not favorably compare in usefulness 
and meaningfulness with the Wechsler-Bellevue. The proper 
extension of the use of the Wechsler-Bellevue to Negroes de- 
pends upon such data as those which this report in part 
provides. 
Purposes of the Investigation 


In order to assess objectively the suitability of the Wechsler- 
Bellevue Scales for the intellective testing of institutionalized 
Negro boys, this study was undertaken to answer the following 
questions: How does the test sift and sort the population as to 
mental level? Do the sub-tests positively discriminate among 
the subjects as they are classified within the various mental 
level categories? What are the patterns and trends of per- 
formance of the total and sub-groups on the sub-tests? Is 
the suggested use of a-short form warrantable with this 
population? 

Procedure 


Two hundred and seventy-six boys were given the Wechs- 
ler-Bellevue (both Verbal and Performance Scales) during 
1943-44. The average institutional population during this 
period was about two hundred and seventy. For the most 
part boys were routinely tested shortly after admittance but 
some were especially referred for testing for purposes of classi- 
fication from among those admitted prior to the initiation of 
the program of intellective testing. 

The Wechsler-Bellevue Scales consist of eleven sub-tests, 
one of which is the Vocabulary alternate in the Verbal Scale, 
which was not used. The five Verbal sub-tests depend heavily 
upon language for administration and for subject responses. 
These primarily involve abstractual, conceptual, and general- 
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izing mental functions. According to Wechsler as reported 
by Rosenzweig, Bundas, Lumbry and Davidson (10) they are 
described as follows: 








































1. Information: consists of questions formulated to tap 
the subject’s range of information on material that the average 
person with average opportunity should be able to obtain for 
himself. 

2. Comprehension: measures the use of “common sense” 
and judgment in situations described to the subject. Success 
on this test seemingly depends upon the possession of a certain 
amount of practical information and a general ability to use 
past experience. 

3. Arithmetical Reasoning: measures mental alertness as 
well as ability to handle practical calculations. 

4. Memory Span for Digits: measures immediate memory 
for digits forward and backward. 

5. Similarities: measures ability to discriminate between 
essential and superficial likenesses; to generalize and think in 
abstract terms. 


The five Performance sub-tests require the subject to ma- 
nipulate concrete materials and to perform certain tasks such 
as arranging pictures and assembling object forms. The same 
authors describe them as follows: 


6. Picture Arrangement: detects ability to comprehend or 
“size up” a total situation. 

7. Picture Completion: measures ability to differentiate 
essential from unessential details. 

8. Block Design: a test of general intellectual functioning, 
involving both synthetic and analytic ability, but weighted 
considerably with ability to solve problems in spatial relations. 

9. Digit Symbol: measures speed and accuracy of learning 
new associations. 

10. Object Assembly: measures insight into spatial relation- 
ships of familiar objects. 


Each sub-test contains items which are related to a com- 
ponent mental-function and the items are arranged in order 
of increasing difficulty. Scores on sub-tests are converted into 
“weighted” scores which make possible direct comparison of 
the various sub-test performances. Separate Verbal and Per- 
formance I.Q.’s are obtained by summating the appropriate 
sub-tests, and these in turn are combined in an over-all mea- 
surement, the Full I.Q. 
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Results 


The chronological age range of the 276 subjects was from 
9.63 years to 20.13 years with a Mean of 14.6 and a S.D. of 1.56 
years. Results for the total group are given in Table 1. 


TABLE 1 
Average Performance for Entire Group (276 Cases) 











Mean Median = E- S.D. 
RUN otic asec cous heweeas 76.5 76.6 926 15.39 
SC CE eee 76.2 75.8 869 14.45 
Performance I.Q. ............ 80.4 82.9 1.094 18.19 





On the basis of individual test results the subjects were 
grouped according to Wechsler: Normal (91-110); Dull Nor- 
mal (80-90); Borderline (66-79); Mentally Defective (below 
66). Test results for these groups are presented in Table 2. 
Comparisons of measures of central tendency and dispersion 
may be made since the age distributions within the sub-groups 
are practically identical. These results are summarized in 


Table 3. 
TABLE 2 
Performance Data of Sub-Groups According to Mental Level 











Mean Median S.D. 

Group N 10. 10. Mean 

NN 08s a ithe a oalscian 52 98.0 97.3 5.29 
SRS SRAN set  eaedae ea ieeee 52 95.3 95.4 7.12 
OMOMTRTICS 5.6550 06d ccc scac 52 100.8 100.8 6.86 
IRIE 5 a0) 55 5 oxe's gee ora 901 64 85.1 84.7 3.37 
Re ery rene eee 64 82.6 82.4 7.26 
POMOLMANCE 6 oi osc cccecess 64 90.1 89.5 6.05 
ST ee eer 90 72.9 73.4 3.94 
Net See eee 90 73.1 72.9 7.17 
PMROTOURTIOE \Sici6 5 sc0s siskce es 90 78.8 79.5 7.62 
Mentally Defective ............ 70 55.8 56.0 7.53 
EN aes cess cesses sis wie 70 60.8 60.0 8.20 
POMOKMANCE 6.04505 00000086 70 60.7 60.3 10.62 





Discriminative Value of the Sub-Tests 
Wechsler, Israel, and Balinsky (13) and Lewinski (5) have 


reported positive discriminative values of the sub-tests of the 
Wechsler-Bellevue Scales in differentiating between the various 
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TABLE 3 
Age Data for Mental Level Sub-Groups 











Mean S.D. S.E. 

Group N Range age Mean age Mean age 
oO Een 52 11.13-18.63 14.8 1.63 .218 
Dull Normal .......... 64 9.63-18.63 14.5 1.64 .208 
DEA os aannn soe 90 11.13-18.63 14.4 1.34 141 
Mentally Defective ..... 70 10.11-20.13 14.6 1.64 196 





intellective levels. Their studies, however, were done with 
quite different samples from that with which we are here 
concerned. 

In order to ascertain the discriminative values of the sub- 
tests in differentiating between subjects categorized on the 
basis of total test results, the differences in mean weighted 
scores, the standard errors of these differences, and the critical 
ratios were calculated. Table 4 shows that all of the sub-tests 
discriminate between the various levels with the exception of 
three: (1). The Digit Span did not satisfactorily distinguish 
the Normal from the Dull Normal subjects, (2) the Digit 
Symbol did not significantly discriminate the Dull Normal 
from the Borderline, and (3) the Picture Arrangement did 
not significantly separate the Normal from the Dull Normal. 
While the results generally agree with those of Wechsler, Israel, 
and Balinsky and with those of Lewinski, they differ at several 
points. The former found the Digit Span test of questionable 
value in discriminating between Borderline and Defective sub- 
jects whereas in this situation the same test does discriminate 
significantly between these two groups. ‘The latter obtained 
significant discrimination on the Digit Span between all groups. 
In this study, however, the Digit Span failed to differentiate 
significantly between the Normal and Dull Normal groups. 


Patterns of Sub-Test Performance 


Inspection of the sub-test performances (see Table 5) shows 
that for the entire 276 subjects the five best-performed were 
in the Performance Scale with the exception of Block Design, 
the Similarities test of the Verbal Scale placing fourth in the list 
of the first five. Accordingly, Block Design plus all of the 
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Verbal sub-tests with the exception of Similarities ranked in 
the lower half of the ten sub-tests. In rank order the three 
act highest, i.e., best-performed, sub-tests were Object Assembly, 
ge TABLE 4 
oes Discriminative Values of Sub-Test Performances Between Mental Levels 
Sub-test groups Difference S.E. C.R 
—— Difference wae 
— 1. Information 
ith Normal—Dull Normal ........... 2.50 15 16.6 
Dull Normal—Borderline ......... 1.17 .26 4.5 
ere Borderline—Mentally Defective ... 1.14 17 6.7 
2. Comprehension 
b Normal—Dull Normal ........... 2.75 46 6.0 
1D- Dull Normal—Borderline ......... 1.05 .28 3.8 
-he Borderline—Mentally Defective ... 221 23 9.6 
d 3. Arithmetic Reasoning 
re Normal—Dull Normal ........... 2.55 45 5.7 
cal Dull Normal—Borderline ......... Las 42 32 
Borderline—Mentally Defective ... 1.91 av 52 
sts 4. Digit Span 
of Normal—Dull Normal ........... 75 44 | yj 
F Dull Normal—Borderline ......... 1.16 ot Su 
ish Borderline—Mental Defective ..... 1.75 35 5.0 
git 5. Similarities 
Normal—Dull Normal ........... 1.73 42 4.1 
nal Dull Normal—Borderline ......... 1.12 30 30 
lid Borderline—Mental Defective ..... 2.67 31 8.6 
6. Picture Completion 
al. Normal—Dull Normal ........... 1.81 43 42 
el Dull Normal—Borderline ......... 1.29 39 aa 
? Borderline—Mentally Defective ... 2.82 39 7.2 
ral 7. Picture Arrangement 
ble Normal—Dull Normal ........... 1.15 42 2.7 
Dull Normal—Borderline ......... 1.44 40 3.6 
ib- Borderline—Mentally Defective ... 3.33 35 9.5 
ate 8. Object Assembly 
Normal—Dull Normal ........... 1.36 45 3.0 
ied Dull Normal—Borderline ......... 1.44 42 3.4 
" Borderline—Mentally Defective ... 3.25 44 7.4 
a 9. Block Design 
ate Normal—Dull Normal ........... 2.35 42 5.6 
Dull Normal—Borderline ......... 1.80 35 5.1 
Borderline—Mentally Defective ... 2.20 30 7.3 
10. Digit Symbol 
Normal—Dull Normal ........... 1.62 34 4.8 
Dull Normal—Borderline ......... .28 .26 1.1 
ws Borderline—Mentally Defective ... 1.77 me 6.6 
ore 
n, Picture Arrangement, and Picture Completion; the three low- 
ist est, i.e., most poorly-performed, were Arithmetic, Information, 
he and Block Design. Quite clearly, performance materials are 
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TABLE 5 
Data and Rankings in Performance on Sub-Tests of Mental-Level Groups 






























Ranking of Mean S.E. 
Sub-test group sub-test weighted score $.D. mean 
1. Information 
eee 9 52 7.42 2.87 40 
Dull Normal ....... 10 64 4.92 1.84 .23 
Borderline ........- 10 90 3.75 1.12 12 
Mentally Defective ... 9 70 2.61 1.00 12 
OS ee 95. 2% 4.41 2.38 14 
2. Comprehension 
Oe ae 4 52 9.54 2.81 39 
Dull Normal ....... 6 64 6.79 2.01 25 
Borderline .......... 6 90 5.74 1.21 13 
Mentally Defective . 7 70 353 1.57 19 
| EER eee 6 276 6.15 2.88 17 
3. Arithmetic Reasoning 
OS eee ea 8 52 7.73 2.15 29 
Dull Normal ....... 9 64 5.18 2.59 32 
Borderline ......... 9 90 3.85 2.58 27 
Mentally Defective ... 10 70 1.94 2:22 26 
‘(| er 95 276 4.41 3.12 19 
4. Digit Span 
oO eee 10 52 7.48 2.35 33 
Dull Normal f 64 6.73 2.28 .29 
Borderlme ......... 7 90 S57 2.i7 .23 
Mentally Defective ... + 70 3.82 2.28 at 
LS RR See 7 276 5.76 2.61 16 
5. Similarities 
ST ere 6 52 9.32 2.55 35 
Dull Normal ....... 4 64 7.59 1.87 23 
Borderline ......... + 90 6.47 1.87 19 
Mentally Defective ... 5 70 3.80 1.98 24 
MME sictieah coves 4 276 6.59 2.79 17 
6. Picture Completion 
Oe ee 3 52 9.71 244 .29 
Dull Normal ....... 3 64 7.90 253 32 
Borderline ......... 3 90 6.61 cml ae 
Mentally Defective ... 6 70 3.79 2.69 32 
OS I eS ene 3 276 6.78 3.35 .20 
7. Picture Arrangement 
eee Z 52 10.40 2.00 .28 
Dull Normal ....... Z 64 9.25 2.47 31 
Borderline ,........ 2 90 7.81 2.33 25 
Mentally Defective ... 3 70 4.48 2.12 25 
ee 2 276 7.79 3.16 19 
8. Object Assembly 
Seer re 1 52 10.73 2.30 32 
Dull Normal ....... 1 64 9.37 2.55 32 
a rer reer 1 90 7.93 2.55 BH | 
Mentally Defective ... 1 70 4.68 2.93 3S 
eR Re 1 276 7.97 3.52 21 
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TABLE 5 (Continued) 











Ranking of Mean S.E. 
Sub-test group sub-test N weighted score S.D. mean 

9. Block Design 
UNIAN sis siia'eiecsatseo 5 52 9.38 2.28 32 
Dull Normal ......... 5 64 7.03 2.19 27 
Borderline ........... 8 90 5.23 2.09 be # 
Mentally Defective ... 8 70 3.03 1.76 ZA 
SO ae ee 8 276 5.90 3.03 12 

10. Digit Symbol 
PMR saa sie'e Siaas a's 7 52 8.25 1.97 2F 
POU INOTINR. 66 o.5:0:0.0 01s 8 64 6.63 1.66 21 
Borderline ........... 5 90 6.35 1.44 B 
Mentally Defective ... Z 70 4.58 1.87 .22 
MOUS sex kes sienbwns 5 276 6.33 2.09 13 





more efficiently handled and at a higher level than verbal 
materials. The results pertaining to the performance of the 
entire group on the sub-tests together with rank order of each 
of the ten sub-tests are set forth in Table 5. 

For the purposes of ascertaining the patterns of performance 
for each of the various mental-level groups the mean weighted 
scores and their standard deviations on each of the sub-tests 
were computed. The data are tabulated in Table 5 and pre- 
sented graphically in Figure 1. The striking similarity of the 
curves for all groups—regardless of intellective status—indi- 
cates systematic and consistent variations for the population 
in organization of mental abilities and hence in their de- 
velopment. 

For the population and for all mental-level groups the back- 
ground of general information and the mental alertness linked 
with the ability to perform mental mathematical computations 
constitute a special deficiency (Information and Arithmetic). 
The subjects were uniformly better able to comprehend or “size 
up” total situations than to distinguish between essential and 
unessential details and parts of common objects and forms 
(Picture Arrangement and Picture Completion). Character- 
istically low performance on Block Design indicates poor syn- 
thetic and analytic abilities in dealing with more complicated 
problems of spatial relationships as contrasted with ability to 
solve problems of simple spatial relationships in assembling 
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familiar objects for Object Assembly, which was the best-per- 
formed sub-test in all groups. 

Some differences, however, are noted in the profiles in 
Figure 1. Information is consonantly low but exceeds Arith- 
metic at the Defective level, falls at about the same place at 
the Borderline level but falls below Arithmetic at the Dull and 
Normal levels. Digit Span exceeds Arithmetic at the lower 


three levels but lies below Arithmetic at the Normal level. 
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Ficure 1 


Legend: - - - Average sub-test performance required to obtain Full I.Q. 
of 100 (14-6 yrs.) 
D- Mentally Defective 
B Borderline 
DN Dull Normal 
N_ Normal 


The Similarities sub-test exceeds all other Verbal sub-tests 








with the exception of Comprehension at the Normal level, | 


which is higher. Digit Symbol is about the same as Object 
Assembly at the Defective but falls far below the latter at all 
other levels. 

Scatter—variability of performance achievement among the 
sub-tests—in the Wechsler-Bellevue is associated with states of 
maladjustment, neuroticism, and psychoses. Diagnostic clin- 
ical signs are related to patterns of sub-test success and failure 
(3, 6, 9, 11, 12). Work in this area is in the experimental 
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_ stage and the findings reported while not conclusive as to rela- 


tionships between psychometric test patterns and mental illness 
are suggestive. It is a matter of conjecture as to what extent 


_ psychopathology or psychological maladjustment influenced 
' the range and level of sub-test performances of the subjects. 


It may be presumed, since few of the subjects examined could 


| have been regarded as psychotic, that presence of clinical fac- 
tors does not seriously mitigate against interpretation of the 
_ data according to organization and level of the mental abilities. 
' It is noteworthy, nevertheless, that examination of the test 
' profiles of groups above the Defective level discloses that in 
_ sub-test performance five relate positively, two negatively, and 
two indecisively with Wechsler’s (12) diagnostic pattern for 


adolescent psychopathic personality trends. 

The consistent and paralleling variation in sub-test per- 
formance of all subjects regardless of mental level raises im- 
portant questions relevant to (1) the study of race differences 
in intellective abilities and (2) the relationships of systematic 
lower-level performance in tests of intelligence by minority 
groups to the extent to which success depends upon such factors 
as education, training, and experience (4, 7, 14). It may be 
that the group patterns of sub-test performance reported here 
reflect relative handicaps in mental development rather than 
manifest strengths and weaknesses of intellective functions. 
Fewer or other depressants to maximal mental development 
may exist in the white population on which the Wechsler- 
Bellevue Test was standardized. Investigation is needed to 
discriminate the sub-tests in terms of the degree to which edu- 
cational and social experiences and achievements are prerequi- 
site to differential success in sub-test performance. 


Use of the Short Form of the Wechsler-Bellevue 


Rabin (8) has offered an abbreviated form of the Wechs- 
ler-Bellevue Scales. Using the Comprehension, Arithmetic, 
and Similarities sub-tests and computing the total weighted 
score by dividing the sum of the weighted scores of these three 
sub-tests by three and then multiplying by ten, Rabin re- 
ported correlations of .95 with the results from administration 
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of the ten sub-tests. It was his opinion that the regional and 
educational homogeneity of his subjects rendered his choice of 
sub-tests a good one for a short form of the Wechsler-Bellevue 
Scales. The author stated that because the Short Form is 
primarily a verbal test it might not prove satisfactory for use 
with persons with a non-English language background. Rabin 
advised further use of the suggested Short Form with other 
groups of subjects for experimental purposes. 

In order to investigate the suitability of the use of the 
Short Form with our subjects, the data were analyzed accord- 
ing to Rabin’s method. All subjects were native-born with a 
common English linguistic background. 

According to this method I.Q.’s differed significantly from 
those deriving from administration of the ten sub-tests for all 


TABLE 6 
Comparison of Results: Short Form and Full Wechsler-Bellevue 








N Mean I.Q. Mean I.Q. Mean S.E. CR. 





Group full test. Rabin. Diff. Dif. 
2 Eo ee eee 52 98.0 97.2 - 8 1.41 .006 
Dull Normal ....... 64 85.1 80.0 -5.1 1.18 43 
Borderline ......... 90 72.9 68.5 44 1.02 4.3 
Mentally Defective . 70 55.8 51.4 —4.4 1.08 4.1 
ct ee eee 276 76.5 72.3 -42 58 74 





mental-level groups with the exception of the Normal. For 
the total of two hundred and seventy-six cases the Mean I.Q. 
yielded by the short form was 72.3, which was significantly 
lower by 4.2 I1.Q. points than the Mean I.Q. (76.5) derived 
from administration of the full test. In every mental-level 
group the Short Form resulted in a lower I.Q. than the ten 
sub-tests. In Table 6 data pertaining to the analysis are given. 

It is concluded, therefore, that the use of the Rabin Short 
Form of the Wechsler-Bellevue Scales is not a steady or satis- 
factory substitute for the ten sub-tests of the Wechsler-Belle- 
vue with the subjects examined. Caution dictates that the 
Short Form should not be used with subjects resembling those 
examined in this study. It appears obvious that the Short 
Form should not be used in the mental examination of subjects 
whose verbal abilities are inferior to their performance abilities. 
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Summary 


The Wechsler-Bellevue Scales for individual mental testing 
were administered to 276 institutionalized delinquent Negro 
boys. The chronological age range was from 9.63 years to 
20.13 years, with a Mean of 14.6 and a S.D. of 1.56 years. 

The study was undertaken in order to report the results of 
the use of the Wechsler-Bellevue on this population, to investi- 
gate the discriminative values of the ten sub-tests of the Scales 
among the various mental levels, to summarize the trends and 
patterns of sub-test performances of the population and of the 
subjects grouped according to level of intellective ability, and 
to examine the suitability of a suggested Short Form of the 
Wechsler-Bellevue Scales for the mental measurement of in- 
stitutionalized delinquent Negro boys. 

1. Results of the administration of the Wechsler-Bellevue 
placed 19 per cent at the Normal level, 25 per cent at the Dull 
Normal, 33 per cent at the Borderline, and 23 per cent at the 
Defective level.? 

2. With the exception of the Defective group, the Per- 
formance I.Q.’s exceeded the Verbal I.Q.’s by 5.5 points for the 
Normal group, 7.5 points for the Dull Normal, and 5.7 for the 
Borderline group. Over-all, the Mean Performance I.Q. ex- 
ceeded the Mean Verbal I.Q. by 4.2 points. 

3. The sub-tests of the Wechsler-Bellevue Scales discrim- 
inate significantly between the several intellective levels (as 
derived from the full test) with the following exceptions: Digit 
Span did not prove satisfactory in distinguishing between the 
Normal and Dull Normal subjects, Digit Symbol between Dull 
Normal and Borderline subjects, and Picture Arrangement be- 
tween the Normal and Dull Normal. 

4. There is marked similarity in the patterns of perform- 
ance from mental level to mental level. The group as a whole 
shows striking disparity of achievement on the sub-tests. 
These differences in performance have relevance to the study 
of racial differences. Those sub-tests characteristically per- 





2A considerable increase of percentages in the higher mental levels would 
result if greater weight were attached to Performance achievement at the expense 
of Verbal in determination of the Full I.Q.’s. 
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formed at lower levels should be studied further in order to 
evaluate the role played by previous life conditions in their 
successful performance. 

5. Consideration in interpretation of the reported results 
should be given to the fact that non-whites were not included 
in the standardization of the Wechsler-Bellevue Scales. Some 
uncalculated error of measurement may have resulted from the 
presence in the subjects of states of negative adjustment, of 
which there are indications according to the positive clinical 
signs developed by Wechsler and others. 

6. The Short Form of the Wechsler-Bellevue by which the 
Full 1.Q. is derived from performance on three of the ten sub- 
tests (Comprehension, Arithmetic, and Similarities) was not 
suited to mental measurement of the individuals examined. 
Evidence indicates that the Short Form should not be used 
with individuals whose Verbal abilities are inferior to their Per- 
formance abilities. 
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Criticism of Benedict and Weltfish’s The Races of Mankind, which raised a 
controversy by presenting evidence to show that there was no relation between 
skin color and intelligence, is made on the grounds that their selection of data is 
open to censure. By use of an analysis of variance technique, it can be shown 
that “skin color as well as geography did affect the test scores of recruits in 1918.” 
It would have been better, therefore, if Benedict and Weltfish had given all of the 
data, and then gone on to argue that it is the Negro’s educational disadvantage 
which handicaps him in such situations. Lorraine Bouthilet. 
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Eighty-six women were selected and trained in a ten-months’ electronic en- 
gineering course. The data analyzed included test and rating scores, final grade- 
point averages (GPA) for the course, and termination records. The ‘author found 
selection was based primarily on interviewer’s over-all judgments of fitness. GPA 
had significant correlations with American Council on Education Cooperative Gen- 
eral Mathematics Test for High-School Students, the Wonderlic Personnel Test, 
previous school grades, “fitness” rating, and “personality” rating. In comparisons 
between high and low achieving students and terminating students, the ACE mathe- 
matics test, the Wonderlic Personnel Test, and the Kuder Preference Record, compu- 
tational key, showed significant differences. E. C. Bell. 
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LXV (1944), 197-217. 

This reports a follow-up study of 138 children, comprising two groups between 
ages 2 and 6, who were examined on both Forms L and M of the Revised Stanford- 
Binet Scale during its standardization, and then retested 10 years later on Form L. 
Previous studies of I.Q. constancy, involving initial tests at the pre-school level and 
retests at varying intervals, are cited for purposes of comparison and contrast with 
the author’s findings. Correlations ranging from .58 to .67 for both groups and 
both forms indicate, in the author’s judgment, a significant predictive value for the 
Stanford-Binet equalling, if not surpassing, other tests, and assure the importance 
in prognosis of the pre-school I.Q. for the group and for the individual when accom- 
panied by supplementary data. Vernon S. Tracht. 





Brown, Fred. “An Experimental and Critical Study of the Intelligence of Negro 
and White Kindergarten Children.” Journal of Genetic Psychology, LXV 
(1944), 161-175. 

A group of 341 native white children of Minneapolis were compared on the 
Stanford-Binet, Form L, with 91 Negro children of the same city. The mean age 
for the white group was 69.51 months as compared with a mean age of 69.15 
months for the Negroes. The mean I.Q.’s for the white and Negro groups were 
107.06 and 100.70, respectively. A comparison of the intelligence of the two groups 
at various occupational levels reveals that the total Negro group resembles the 
white group at the semi-skilled and unskilled labor class. The results differ from 
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previous studies. The conclusion of the author is that the developmental con- 
striction of the Negroes is based upon cultural factors. Betty Steele. 
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on the principle of minimal discrepancy. (Courtesy Psychometrika.) 
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unitary traits in a situation usually reduces to finding a satisfactory rotation in a 
Thurstone centroid analysis. Seven principles, three of which are new, are described 
whereby rotation may be determined and/or judged. It is argued that the most 
fundamental is the principle of “parallel proportional profiles” or “simultaneous 
simple structure.” A mathematical proof of the uniqueness of determination by 
this means is attempted and equations are suggested for discovering the unique 
position. (Courtesy Psychometrika.) 





Goldfarb, William. “Adolescent Performance in the Wechsler-Bellevue Intelligence 
Scales and the Revised Stanford-Binet Examination, Form L.” Journal of 
Educational Psychology, XXXV (1944), 503-507. 

Scores of 60 adolescents living in foster homes and dependent for various periods 
of time, were correlated on the Revised Stanford-Binet, Form L, and the Wechsler- 
Bellevue Scale. The study confirmed the significant correlations between the I.Q. 
ratings on the two tests, but, unlike the findings of previous studies, the Wechsler- 
Bellevue 1.Q. tended to be lower at « gay levels, especially so among chil- 
dren with Wechsler-Bellevue 1.Q. of 110 or higher. This confirmed the author’s 
practical experience that the Wechsler-Bellevue Test appears to be poor in dis- 
criminating the superior adolescents. He believes that, while test dispersion may 
partly explain the differences in I.Q. between the two tests, there is also a difference 
in the mental patterns of the groups studied. Therefore, he does not advocate a 
single regression formula derived from small samplings. £. C. Bell. 





Havighurst, Robert J. and Hilkevitch, Rhea R. “The Intelligence of Indian Chil- 
dren as Measured by a Performance Scale.” Journal of Abnormal and Social 
Psychology, XXXIX (1944), 419-433. 

In order to find out the ways in which the children of several Indian tribes 
varied from tribe to tribe and from community to community within a tribe, and 
also to compare their scores with those of white children, 670 Indian children rang- 
ing in age from 6 through 15 were tested on a shortened form of the Grace Arthur 
Point Performance Scale. The Arthur Performance Scale was used because previous 
studies have shown it to be relatively culture-free. It was found that Indian 
children did about as well as white children, and that tribal and community dif- 
ferences exist just as in various groups in a white population. There was some 
indication that children from tribes little influenced by white culture did not do 
so well on the test, but there was no evidence to support the statement that Indian 
children work more slowly than white children. It is concluded that with Indian 
children a performance test is a better instrument than a test requiring use of 
the English language. Lorraine Bouthilet. 
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eo Karl J. “A Simple Method of Factor Analysis.” Psychometrika, IX 

(1944), 257-262. 

A simple method for extracting correlated factors simultaneously is described. 
The method is based on the idea that the centroid pattern coefficients for the sec- 
tions of unit rank of the complete matrix may be interpreted as structure values 
for the entire matrix. Only the routine centroid average process is required. 


(Courtesy Psychometrika.) 





Klugman, Samuel F. “Test Scores for Clerical Aptitude and Interests Before and 
yd a Year of Schooling.” Journal of Genetic Psychology, LXV (1944), 
To determine whether test scores for clerical aptitude and interests, and the 

relationship between these two, remain the same after a year’s schooling, 207 white, 

female, native-born students i in commercial courses of a vocational high school were 
tested and, after 2 semesters’ training, retested on appropriate portions of the 

Strong Interest Blank and the Minnesota Clerical Aptitude Test. A comparison of 

scores from the 30 oldest and a like number of the youngest indicated that the 

general improvement in scores noted for most subjects is probably due to schooling 
rather than maturation, since no reliable difference between means was found. 

Correlation between scores on the same tests one year apart revealed high relation- 

-_ = aptitude and substantial relationship for clerical interest. Vernon 

S. Tracht. 





Krugman, Morris. “Recent Developments in Clinical Psychology.” Journal of 

Consulting Psychology, VIII (1944), 342-352. 

Two general trends in clinical psychology during the war period are observed 
by the author: 1) Halt in research on new clinical techniques, and 2) Great advance 
in experimentation in and use of short procedures including group tests and screen- 
ing methods. The Army’s mental hygiene units are “child-guidance” clinics (for 
soldiers), emphasizing test patterning and diagnosis, factor analysis in evaluation 
of test batteries, and increased interest in projective techniques, especially the 
Rorschach and the Thematic Apperception Test. Abbreviated individual and group 
techniques are being developed for them. There is a corresponding loss of interest 
in personality questionnaire tests. Clinical psychologists are emphasizing diagnosis 
and neglecting psychotherapy. E£. C. Beil. 





Richardson, Marion W. “The Interpretation of a Test Validity Coefficient in 
Terms of Increased Efficiency of a Selected Group of Personnel.” Psycho- 
metrika, IX (1944), 245-248. 

The predictive efficiency of a test used to select personnel is defined in terms 
of total effectiveness of the group thus selected, as compared with chance selection. 
The formula developed requires the use of an estimate of the ratio of average 
effectiveness of men selected to the average effectiveness of men not selected by 
the test. The predictive efficiency of the test varies directly with the magnitude of 
this ratio and also directly with the percentage rejected. (Courtesy Psychometrika.) 





Sadowsky, Michael A. “Mathematical Analysis in Psychology of Education: Com- 
putation of Stimulation, Rapport, and Instructor’s Driving Power.” Psycho- 
metrika, IX (1944), 249-256. 

Mathematical expressions are derived for such concepts as stimulation of 
student by instructor, student-instructor rapport, and driving power of instructor, 
in terms of the student’s and the instructor’s foci of attention, their strength of 
concentration, and the intensity of the presentation and of the reception of details 
of subject matter. Under the assumption of normal distribution, the mathematical 
methods of combination and integration yield conclusions on summary integral 
effects of interrelations within the educational team. The psychological interpre- 
tation of the mathematical results thus obtained conforms with common sense. 
The main emphasis of the article is the exposition of how the mathematical method 
of combination and integration can be used to estimate the resultant effect of 
various independent combined simple factors acting independently within the in- 
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dividuals forming the educational team. No claim is made as to the absolute 
truthfulness and reliability of the psychological postulates used at the beginning 
stage of the mathematical analysis. (Courtesy Psychometrika.) 





Spinelle, Leo and Nemzek, Claude L. “The Relationship of Personality Test Scores 
to School Marks and Intelligence Test Scores.” Journal of Social Psychology, 
XX (1944), 289-294. 

Results of a study undertaken to investigate the usefulness of the Link In- 
ventory of Interests and Activities for prediction of success in school showed that, 
with junior-high-school girls, the measures yielded by Link’s scale “do not possess 
direct value for educational guidance.” It appeared from the fairly high correla- 
tion between intelligence quotients and school marks that the intelligence quotient 
could be used for group, but not individual, prediction of scholastic success, and 
that the Link Inventory should be considered as an objective questionnaire giving 
information to serve as a basis for discussion in personal interviews in a mental 
hygiene program. Lorraine Bouthilet. 





Staff, Personnel Research Section, Classification and Replacement Branch, Adjutant 
General’s Office. “The New Army Individual Test of General Mental Ability.” 
Psychological Bulletin, XLI (1944), 532-538. 

A new individual test of general learning ability was prepared in response to 
many requests from psychologists in the military services, especially those working 
in Special Training Units, Replacement Training Centers, and Army hospitals and 
convalescent centers. Seventeen verbal and non-verbal tests were tried out, the 
reliability estimated according to the Kuder-Richardson formula, and validation 
carried out with the Army General Classification Test as the criterion. Three verbal 
tests and three non-verbal were chosen on the basis not only of statistical con- 
siderations but also of several practical requirements making the test applicable for 
Army use. The test was standardized, and norms are given in terms of standard 
scores and Army grades. Lorraine Bouthilet. 





Wallen, Richard. “Some Testing Needs in Military Clinical Psychology.” Psycho- 

logical Bulletin, XLI (1944), 539-542. 

Tests developed in civilian life are sometimes not applicable to military needs, 
especially in the task of testing recruits. Most published tests are too long, too 
dependent on a high level of reading ability, and too much time is needed for 
scoring and interpretation. A test for recruits should have easily understandable 
directions, the performance required should be simple, and the reliability and 
validity should be based on appropriate norms. It is possible to construct such a 
test because the problem is primarily one of discrimination at only one end of 
the trait continuum—that is, of determining men who are not suitable for military 
service. Since the purpose of the test is to weed out the grossly atypical indi- 
viduals, items to which a large proportion of the population respond in a given 
way are most useful. Promising results have been obtained in a few exploratory 
studies. Lorraine Bouthilet. 


Wellman, Beth L. “Binet I1.Q. Changes of vn ggg ‘one A Re-Analysis.” 

Journal of Genetic Psychology, LXV (1944), 239-26 

A pre-school and a control group of 47 and 44 children, respectively, were 
given the Stanford-Binet tests at the beginning and end of the project period which 
ranged from 77 to 972 days. The mean age for the pre-school group was 40.3 
months as compared to 40.0 months for the control group. The mean I.Q. of the 
pre-school group was 86.9 while that of the control group was 83.5. The results 
reafhirm the original study, indicating that the pre-school child with regular attend- 
ance, and in residence more than a year, made significantly better progress in 
intelligence than the child of equal initial intelligence, and in residence for a similar 
period, who did not attend pre-school. Betty Steele. 








Wherry, Robert J. “Maximal Weighting of Qualitative Data.” Psychometrika, 
IX (1944), 263-266. 
A method whereby biographical or other questionnaire data of a purely quali- 
tative nature may be used to predict success or failure on an independent criterion 
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is presented. The method is not new but the present least-squares derivation and 
the transformation equation for punched card coding were not available in the 
literature. The proper weights are found to be proportional to the per cent of 
passers in the various categories. The method is suggested as a suitable substitute 
for non-linear approaches in connection with purely quantitative data as well. The 
implications of reweighting in connection with multiple regression are discussed. 
The lavish use of degrees of freedom makes cross-validation extremely desirable. 


(Courtesy Psychometrika.) 


Wherry, Robert J. and Gaylord, Richard H. “Factor Pattern of Test Items and 
Tests as a Function of the Correlation Coefficient: Content, Difficulty, and 
Constant Error Factors.” Psychometrika, IX (1944), 237-244. 

A dilemma was created for factor analysts by Ferguson (Psychometrika, 1941, 

6, 323-329) when he demonstrated that test items or sub-tests of varying difficulty 

will yield a correlation matrix of rank greater than 1, even though the material 

from which the items or sub-tests are drawn is homogeneous, although homogeneity 

of such material had been defined operationally by factor analysts as having a 

correlation matrix of rank 1. This dilemma has been resolved as a case of 

ambiguity, which lay in (1) failure to specify whether homogeneity was to apply 
to content, difficulty, or both, and (2) failure to state explicitly the kind of corre- 
lation to be used in obtaining the matrix. It is demonstrated that (1) if the 
material is homogeneous in both respects, the type of coefficient is immaterial, but 
(2) if content is homogeneous but difficulty is not, the homogeneity of the content 
can be demonstrated only by using the tetrachoric ‘correlation coefficient in deriving 
the matrix; and that the use of the phi-coefficient (Pearsonian r) will disclose only 
the non-homogeneity of the difficulty and lead to a series of constant error factors 
as contrasted with content factors. Since varying difficulty of items (and possibly 
sub-tests) is desirable as well as practically unavoidable, it is recommended that all 
factor analysis problems be carried out with tetrachoric correlations. While no 
one would want to obtain the constant error factors by factor analysis (difficulty 
being more easily obtained by counting passes), their importance for test con- 
struction is pointed out. (Courtesy Psychometrika.) 

















